Hi Upayavira,

I guess that is the problem. I am currently using a function for generating
an ID. It takes the current date and time to milliseconds and generates the
id. This is the function.

public static String generateID(){
        Date dNow = new Date();
        SimpleDateFormat ft = new SimpleDateFormat("yyMMddhhmmssMs");
        String datetime = ft.format(dNow);
        return datetime;
    }


I believe that despite having a millisecond precision in the id generation,
multiple objects are being assigned the same ID. Can you suggest a better
way to generate the ID?

Regards,
Vineeth


On Tue, Jul 21, 2015 at 1:29 PM, Upayavira <u...@odoko.co.uk> wrote:

> Are you making sure that every document has a unique ID? Index into an
> empty Solr, then look at your maxdocs vs numdocs. If they are different
> (maxdocs is higher) then some of your documents have been deleted,
> meaning some were overwritten.
>
> That might be a place to look.
>
> Upayavira
>
> On Tue, Jul 21, 2015, at 09:24 PM, solr.user.1...@gmail.com wrote:
> > I can confirm this behavior, seen when sending json docs in batch, never
> > happens when sending one by one, but sporadic when sending batches.
> >
> > Like if sole/jetty drops couple of documents out of the batch.
> >
> > Regards
> >
> > > On 21 Jul 2015, at 21:38, Vineeth Dasaraju <vineeth.ii...@gmail.com>
> wrote:
> > >
> > > Hi,
> > >
> > > Thank You Erick for your inputs. I tried creating batches of 1000
> objects
> > > and indexing it to solr. The performance is way better than before but
> I
> > > find that number of indexed documents that is shown in the dashboard is
> > > lesser than the number of documents that I had actually indexed through
> > > solrj. My code is as follows:
> > >
> > > private static String SOLR_SERVER_URL = "
> http://localhost:8983/solr/newcore
> > > ";
> > > private static String JSON_FILE_PATH =
> "/home/vineeth/week1_fixed.json";
> > > private static JSONParser parser = new JSONParser();
> > > private static SolrClient solr = new HttpSolrClient(SOLR_SERVER_URL);
> > >
> > > public static void main(String[] args) throws IOException,
> > > SolrServerException, ParseException {
> > >        File file = new File(JSON_FILE_PATH);
> > >        Scanner scn=new Scanner(file,"UTF-8");
> > >        JSONObject object;
> > >        int i = 0;
> > >        Collection<SolrInputDocument> batch = new
> > > ArrayList<SolrInputDocument>();
> > >        while(scn.hasNext()){
> > >            object= (JSONObject) parser.parse(scn.nextLine());
> > >            SolrInputDocument doc = indexJSON(object);
> > >            batch.add(doc);
> > >            if(i%1000==0){
> > >                System.out.println("Indexed " + (i+1) + " objects." );
> > >                solr.add(batch);
> > >                batch = new ArrayList<SolrInputDocument>();
> > >            }
> > >            i++;
> > >        }
> > >        solr.add(batch);
> > >        solr.commit();
> > >        System.out.println("Indexed " + (i+1) + " objects." );
> > > }
> > >
> > > public static SolrInputDocument indexJSON(JSONObject jsonOBJ) throws
> > > ParseException, IOException, SolrServerException {
> > >    Collection<SolrInputDocument> batch = new
> > > ArrayList<SolrInputDocument>();
> > >
> > >    SolrInputDocument mainEvent = new SolrInputDocument();
> > >    mainEvent.addField("id", generateID());
> > >    mainEvent.addField("RawEventMessage",
> jsonOBJ.get("RawEventMessage"));
> > >    mainEvent.addField("EventUid", jsonOBJ.get("EventUid"));
> > >    mainEvent.addField("EventCollector", jsonOBJ.get("EventCollector"));
> > >    mainEvent.addField("EventMessageType",
> jsonOBJ.get("EventMessageType"));
> > >    mainEvent.addField("TimeOfEvent", jsonOBJ.get("TimeOfEvent"));
> > >    mainEvent.addField("TimeOfEventUTC", jsonOBJ.get("TimeOfEventUTC"));
> > >
> > >    Object obj = parser.parse(jsonOBJ.get("User").toString());
> > >    JSONObject userObj = (JSONObject) obj;
> > >
> > >    SolrInputDocument childUserEvent = new SolrInputDocument();
> > >    childUserEvent.addField("id", generateID());
> > >    childUserEvent.addField("User", userObj.get("User"));
> > >
> > >    obj = parser.parse(jsonOBJ.get("EventDescription").toString());
> > >    JSONObject eventdescriptionObj = (JSONObject) obj;
> > >
> > >    SolrInputDocument childEventDescEvent = new SolrInputDocument();
> > >    childEventDescEvent.addField("id", generateID());
> > >    childEventDescEvent.addField("EventApplicationName",
> > > eventdescriptionObj.get("EventApplicationName"));
> > >    childEventDescEvent.addField("Query",
> eventdescriptionObj.get("Query"));
> > >
> > >    obj=
> JSONValue.parse(eventdescriptionObj.get("Information").toString());
> > >    JSONArray informationArray = (JSONArray) obj;
> > >
> > >    for(int i = 0; i<informationArray.size(); i++){
> > >        JSONObject domain = (JSONObject) informationArray.get(i);
> > >
> > >        SolrInputDocument domainDoc = new SolrInputDocument();
> > >        domainDoc.addField("id", generateID());
> > >        domainDoc.addField("domainName", domain.get("domainName"));
> > >
> > >        String s = domain.get("columns").toString();
> > >        obj= JSONValue.parse(s);
> > >        JSONArray ColumnsArray = (JSONArray) obj;
> > >
> > >        SolrInputDocument columnsDoc = new SolrInputDocument();
> > >        columnsDoc.addField("id", generateID());
> > >
> > >        for(int j = 0; j<ColumnsArray.size(); j++){
> > >            JSONObject ColumnsObj = (JSONObject) ColumnsArray.get(j);
> > >            SolrInputDocument columnDoc = new SolrInputDocument();
> > >            columnDoc.addField("id", generateID());
> > >            columnDoc.addField("movieName",
> ColumnsObj.get("movieName"));
> > >            columnsDoc.addChildDocument(columnDoc);
> > >        }
> > >        domainDoc.addChildDocument(columnsDoc);
> > >        childEventDescEvent.addChildDocument(domainDoc);
> > >    }
> > >
> > >    mainEvent.addChildDocument(childEventDescEvent);
> > >    mainEvent.addChildDocument(childUserEvent);
> > >    return mainEvent;
> > > }
> > >
> > > I would be grateful if you could let me know what I am missing.
> > >
> > > On Sun, Jul 19, 2015 at 2:16 PM, Erick Erickson <
> erickerick...@gmail.com>
> > > wrote:
> > >
> > >> First thing is it looks like you're only sending one document at a
> > >> time, perhaps with child objects. This is not optimal at all. I
> > >> usually batch my docs up in groups of 1,000, and there is anecdotal
> > >> evidence that there may (depending on the docs) be some gains above
> > >> that number. Gotta balance the batch size off against how bug the docs
> > >> are of course.
> > >>
> > >> Assuming that you really are calling this method for one doc (and
> > >> children) at a time, the far bigger problem other than calling
> > >> server.add for each parent/children is that you're then calling
> > >> solr.commit() every time. This is an anti-pattern. Generally, let the
> > >> autoCommit setting in solrconfig.xml handle the intermediate commits
> > >> while the indexing program is running and only issue a commit at the
> > >> very end of the job if at all.
> > >>
> > >> Best,
> > >> Erick
> > >>
> > >> On Sun, Jul 19, 2015 at 12:08 PM, Vineeth Dasaraju
> > >> <vineeth.ii...@gmail.com> wrote:
> > >>> Hi,
> > >>>
> > >>> I am trying to index JSON objects (which contain nested JSON objects
> and
> > >>> Arrays in them) into solr.
> > >>>
> > >>> My JSON Object looks like the following (This is fake data that I am
> > >> using
> > >>> for this example):
> > >>>
> > >>> {
> > >>>    "RawEventMessage": "Lorem ipsum dolor sit amet, consectetur
> > >> adipiscing
> > >>> elit. Aliquam dolor orci, placerat ac pretium a, tincidunt
> consectetur
> > >>> mauris. Etiam sollicitudin sapien id odio tempus, non sodales odio
> > >> iaculis.
> > >>> Donec fringilla diam at placerat interdum. Proin vitae arcu non augue
> > >>> facilisis auctor id non neque. Integer non nibh sit amet justo
> facilisis
> > >>> semper a vel ligula. Pellentesque commodo vulputate consequat. ",
> > >>>    "EventUid": "1279706565",
> > >>>    "TimeOfEvent": "2015-05-01-08-07-13",
> > >>>    "TimeOfEventUTC": "2015-05-01-01-07-13",
> > >>>    "EventCollector": "kafka",
> > >>>    "EventMessageType": "kafka-@column",
> > >>>    "User": {
> > >>>        "User": "Lorem ipsum",
> > >>>        "UserGroup": "Manager",
> > >>>        "Location": "consectetur adipiscing",
> > >>>        "Department": "Legal"
> > >>>    },
> > >>>    "EventDescription": {
> > >>>        "EventApplicationName": "",
> > >>>        "Query": "SELECT * FROM MOVIES",
> > >>>        "Information": [
> > >>>            {
> > >>>                "domainName": "English",
> > >>>                "columns": [
> > >>>                    {
> > >>>                        "movieName": "Casablanca",
> > >>>                        "duration": "154",
> > >>>                    },
> > >>>    {
> > >>>                        "movieName": "Die Hard",
> > >>>                        "duration": "127",
> > >>>                    }
> > >>>                ]
> > >>>            },
> > >>>            {
> > >>>                "domainName": "Hindi",
> > >>>                "columns": [
> > >>>                    {
> > >>>                        "movieName": "DDLJ",
> > >>>                        "duration": "176",
> > >>>                    }
> > >>>                ]
> > >>>            }
> > >>>        ]
> > >>>    }
> > >>> }
> > >>>
> > >>>
> > >>>
> > >>> My function for indexing the object is as follows:
> > >>>
> > >>> public static void indexJSON(JSONObject jsonOBJ) throws
> ParseException,
> > >>> IOException, SolrServerException {
> > >>>    Collection<SolrInputDocument> batch = new
> > >>> ArrayList<SolrInputDocument>();
> > >>>
> > >>>    SolrInputDocument mainEvent = new SolrInputDocument();
> > >>>    mainEvent.addField("id", generateID());
> > >>>    mainEvent.addField("RawEventMessage",
> > >> jsonOBJ.get("RawEventMessage"));
> > >>>    mainEvent.addField("EventUid", jsonOBJ.get("EventUid"));
> > >>>    mainEvent.addField("EventCollector",
> jsonOBJ.get("EventCollector"));
> > >>>    mainEvent.addField("EventMessageType",
> > >> jsonOBJ.get("EventMessageType"));
> > >>>    mainEvent.addField("TimeOfEvent", jsonOBJ.get("TimeOfEvent"));
> > >>>    mainEvent.addField("TimeOfEventUTC",
> jsonOBJ.get("TimeOfEventUTC"));
> > >>>
> > >>>    Object obj = parser.parse(jsonOBJ.get("User").toString());
> > >>>    JSONObject userObj = (JSONObject) obj;
> > >>>
> > >>>    SolrInputDocument childUserEvent = new SolrInputDocument();
> > >>>    childUserEvent.addField("id", generateID());
> > >>>    childUserEvent.addField("User", userObj.get("User"));
> > >>>
> > >>>    obj = parser.parse(jsonOBJ.get("EventDescription").toString());
> > >>>    JSONObject eventdescriptionObj = (JSONObject) obj;
> > >>>
> > >>>    SolrInputDocument childEventDescEvent = new SolrInputDocument();
> > >>>    childEventDescEvent.addField("id", generateID());
> > >>>    childEventDescEvent.addField("EventApplicationName",
> > >>> eventdescriptionObj.get("EventApplicationName"));
> > >>>    childEventDescEvent.addField("Query",
> > >> eventdescriptionObj.get("Query"));
> > >>>
> > >>>    obj=
> > >> JSONValue.parse(eventdescriptionObj.get("Information").toString());
> > >>>    JSONArray informationArray = (JSONArray) obj;
> > >>>
> > >>>    for(int i = 0; i<informationArray.size(); i++){
> > >>>        JSONObject domain = (JSONObject) informationArray.get(i);
> > >>>
> > >>>        SolrInputDocument domainDoc = new SolrInputDocument();
> > >>>        domainDoc.addField("id", generateID());
> > >>>        domainDoc.addField("domainName", domain.get("domainName"));
> > >>>
> > >>>        String s = domain.get("columns").toString();
> > >>>        obj= JSONValue.parse(s);
> > >>>        JSONArray ColumnsArray = (JSONArray) obj;
> > >>>
> > >>>        SolrInputDocument columnsDoc = new SolrInputDocument();
> > >>>        columnsDoc.addField("id", generateID());
> > >>>
> > >>>        for(int j = 0; j<ColumnsArray.size(); j++){
> > >>>            JSONObject ColumnsObj = (JSONObject) ColumnsArray.get(j);
> > >>>            SolrInputDocument columnDoc = new SolrInputDocument();
> > >>>            columnDoc.addField("id", generateID());
> > >>>            columnDoc.addField("movieName",
> ColumnsObj.get("movieName"));
> > >>>            columnsDoc.addChildDocument(columnDoc);
> > >>>        }
> > >>>        domainDoc.addChildDocument(columnsDoc);
> > >>>        childEventDescEvent.addChildDocument(domainDoc);
> > >>>    }
> > >>>
> > >>>    mainEvent.addChildDocument(childEventDescEvent);
> > >>>    mainEvent.addChildDocument(childUserEvent);
> > >>>    batch.add(mainEvent);
> > >>>    solr.add(batch);
> > >>>    solr.commit();
> > >>> }
> > >>>
> > >>> When I try to index the using the above code, I am able to index
> only 12
> > >>> Objects per second. Is there a faster way to do the indexing? I
> believe I
> > >>> am using the json-fast parser which is one of the fastest parsers for
> > >> json.
> > >>>
> > >>> Your help will be very valuable to me.
> > >>>
> > >>> Thanks,
> > >>> Vineeth
> > >>
>

Reply via email to