Hi Erick,

As correctly pointed out by you, the main reason why documents were
disappearing was that I was assigning same id to multiple documents. This
got resolved after I used the UUID as suggested by Mohsen. Thank you for
your inputs.

Regards,
Vineeth

On Wed, Jul 22, 2015 at 9:39 AM, Erick Erickson <erickerick...@gmail.com>
wrote:

> The other classic error  is to not send the batch at the end, but
> at a glance that's not a problem for you, after the while loop
> you send the batch that'll catch any docs left over.
>
> solr.user, that might be your problem? Because I've never seen
> this happen.
>
> On Tue, Jul 21, 2015 at 1:47 PM, Fadi Mohsen <fadi.moh...@gmail.com>
> wrote:
> > In Java: UUID.randomUUID();
> >
> > That is what I'm using.
> >
> > Regards
> >
> >> On 21 Jul 2015, at 22:38, Vineeth Dasaraju <vineeth.ii...@gmail.com>
> wrote:
> >>
> >> Hi Upayavira,
> >>
> >> I guess that is the problem. I am currently using a function for
> generating
> >> an ID. It takes the current date and time to milliseconds and generates
> the
> >> id. This is the function.
> >>
> >> public static String generateID(){
> >>        Date dNow = new Date();
> >>        SimpleDateFormat ft = new SimpleDateFormat("yyMMddhhmmssMs");
> >>        String datetime = ft.format(dNow);
> >>        return datetime;
> >>    }
> >>
> >>
> >> I believe that despite having a millisecond precision in the id
> generation,
> >> multiple objects are being assigned the same ID. Can you suggest a
> better
> >> way to generate the ID?
> >>
> >> Regards,
> >> Vineeth
> >>
> >>
> >>> On Tue, Jul 21, 2015 at 1:29 PM, Upayavira <u...@odoko.co.uk> wrote:
> >>>
> >>> Are you making sure that every document has a unique ID? Index into an
> >>> empty Solr, then look at your maxdocs vs numdocs. If they are different
> >>> (maxdocs is higher) then some of your documents have been deleted,
> >>> meaning some were overwritten.
> >>>
> >>> That might be a place to look.
> >>>
> >>> Upayavira
> >>>
> >>>> On Tue, Jul 21, 2015, at 09:24 PM, solr.user.1...@gmail.com wrote:
> >>>> I can confirm this behavior, seen when sending json docs in batch,
> never
> >>>> happens when sending one by one, but sporadic when sending batches.
> >>>>
> >>>> Like if sole/jetty drops couple of documents out of the batch.
> >>>>
> >>>> Regards
> >>>>
> >>>>> On 21 Jul 2015, at 21:38, Vineeth Dasaraju <vineeth.ii...@gmail.com>
> >>> wrote:
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>> Thank You Erick for your inputs. I tried creating batches of 1000
> >>> objects
> >>>>> and indexing it to solr. The performance is way better than before
> but
> >>> I
> >>>>> find that number of indexed documents that is shown in the dashboard
> is
> >>>>> lesser than the number of documents that I had actually indexed
> through
> >>>>> solrj. My code is as follows:
> >>>>>
> >>>>> private static String SOLR_SERVER_URL = "
> >>> http://localhost:8983/solr/newcore
> >>>>> ";
> >>>>> private static String JSON_FILE_PATH =
> >>> "/home/vineeth/week1_fixed.json";
> >>>>> private static JSONParser parser = new JSONParser();
> >>>>> private static SolrClient solr = new HttpSolrClient(SOLR_SERVER_URL);
> >>>>>
> >>>>> public static void main(String[] args) throws IOException,
> >>>>> SolrServerException, ParseException {
> >>>>>       File file = new File(JSON_FILE_PATH);
> >>>>>       Scanner scn=new Scanner(file,"UTF-8");
> >>>>>       JSONObject object;
> >>>>>       int i = 0;
> >>>>>       Collection<SolrInputDocument> batch = new
> >>>>> ArrayList<SolrInputDocument>();
> >>>>>       while(scn.hasNext()){
> >>>>>           object= (JSONObject) parser.parse(scn.nextLine());
> >>>>>           SolrInputDocument doc = indexJSON(object);
> >>>>>           batch.add(doc);
> >>>>>           if(i%1000==0){
> >>>>>               System.out.println("Indexed " + (i+1) + " objects." );
> >>>>>               solr.add(batch);
> >>>>>               batch = new ArrayList<SolrInputDocument>();
> >>>>>           }
> >>>>>           i++;
> >>>>>       }
> >>>>>       solr.add(batch);
> >>>>>       solr.commit();
> >>>>>       System.out.println("Indexed " + (i+1) + " objects." );
> >>>>> }
> >>>>>
> >>>>> public static SolrInputDocument indexJSON(JSONObject jsonOBJ) throws
> >>>>> ParseException, IOException, SolrServerException {
> >>>>>   Collection<SolrInputDocument> batch = new
> >>>>> ArrayList<SolrInputDocument>();
> >>>>>
> >>>>>   SolrInputDocument mainEvent = new SolrInputDocument();
> >>>>>   mainEvent.addField("id", generateID());
> >>>>>   mainEvent.addField("RawEventMessage",
> >>> jsonOBJ.get("RawEventMessage"));
> >>>>>   mainEvent.addField("EventUid", jsonOBJ.get("EventUid"));
> >>>>>   mainEvent.addField("EventCollector",
> jsonOBJ.get("EventCollector"));
> >>>>>   mainEvent.addField("EventMessageType",
> >>> jsonOBJ.get("EventMessageType"));
> >>>>>   mainEvent.addField("TimeOfEvent", jsonOBJ.get("TimeOfEvent"));
> >>>>>   mainEvent.addField("TimeOfEventUTC",
> jsonOBJ.get("TimeOfEventUTC"));
> >>>>>
> >>>>>   Object obj = parser.parse(jsonOBJ.get("User").toString());
> >>>>>   JSONObject userObj = (JSONObject) obj;
> >>>>>
> >>>>>   SolrInputDocument childUserEvent = new SolrInputDocument();
> >>>>>   childUserEvent.addField("id", generateID());
> >>>>>   childUserEvent.addField("User", userObj.get("User"));
> >>>>>
> >>>>>   obj = parser.parse(jsonOBJ.get("EventDescription").toString());
> >>>>>   JSONObject eventdescriptionObj = (JSONObject) obj;
> >>>>>
> >>>>>   SolrInputDocument childEventDescEvent = new SolrInputDocument();
> >>>>>   childEventDescEvent.addField("id", generateID());
> >>>>>   childEventDescEvent.addField("EventApplicationName",
> >>>>> eventdescriptionObj.get("EventApplicationName"));
> >>>>>   childEventDescEvent.addField("Query",
> >>> eventdescriptionObj.get("Query"));
> >>>>>
> >>>>>   obj=
> >>> JSONValue.parse(eventdescriptionObj.get("Information").toString());
> >>>>>   JSONArray informationArray = (JSONArray) obj;
> >>>>>
> >>>>>   for(int i = 0; i<informationArray.size(); i++){
> >>>>>       JSONObject domain = (JSONObject) informationArray.get(i);
> >>>>>
> >>>>>       SolrInputDocument domainDoc = new SolrInputDocument();
> >>>>>       domainDoc.addField("id", generateID());
> >>>>>       domainDoc.addField("domainName", domain.get("domainName"));
> >>>>>
> >>>>>       String s = domain.get("columns").toString();
> >>>>>       obj= JSONValue.parse(s);
> >>>>>       JSONArray ColumnsArray = (JSONArray) obj;
> >>>>>
> >>>>>       SolrInputDocument columnsDoc = new SolrInputDocument();
> >>>>>       columnsDoc.addField("id", generateID());
> >>>>>
> >>>>>       for(int j = 0; j<ColumnsArray.size(); j++){
> >>>>>           JSONObject ColumnsObj = (JSONObject) ColumnsArray.get(j);
> >>>>>           SolrInputDocument columnDoc = new SolrInputDocument();
> >>>>>           columnDoc.addField("id", generateID());
> >>>>>           columnDoc.addField("movieName",
> >>> ColumnsObj.get("movieName"));
> >>>>>           columnsDoc.addChildDocument(columnDoc);
> >>>>>       }
> >>>>>       domainDoc.addChildDocument(columnsDoc);
> >>>>>       childEventDescEvent.addChildDocument(domainDoc);
> >>>>>   }
> >>>>>
> >>>>>   mainEvent.addChildDocument(childEventDescEvent);
> >>>>>   mainEvent.addChildDocument(childUserEvent);
> >>>>>   return mainEvent;
> >>>>> }
> >>>>>
> >>>>> I would be grateful if you could let me know what I am missing.
> >>>>>
> >>>>> On Sun, Jul 19, 2015 at 2:16 PM, Erick Erickson <
> >>> erickerick...@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> First thing is it looks like you're only sending one document at a
> >>>>>> time, perhaps with child objects. This is not optimal at all. I
> >>>>>> usually batch my docs up in groups of 1,000, and there is anecdotal
> >>>>>> evidence that there may (depending on the docs) be some gains above
> >>>>>> that number. Gotta balance the batch size off against how bug the
> docs
> >>>>>> are of course.
> >>>>>>
> >>>>>> Assuming that you really are calling this method for one doc (and
> >>>>>> children) at a time, the far bigger problem other than calling
> >>>>>> server.add for each parent/children is that you're then calling
> >>>>>> solr.commit() every time. This is an anti-pattern. Generally, let
> the
> >>>>>> autoCommit setting in solrconfig.xml handle the intermediate commits
> >>>>>> while the indexing program is running and only issue a commit at the
> >>>>>> very end of the job if at all.
> >>>>>>
> >>>>>> Best,
> >>>>>> Erick
> >>>>>>
> >>>>>> On Sun, Jul 19, 2015 at 12:08 PM, Vineeth Dasaraju
> >>>>>> <vineeth.ii...@gmail.com> wrote:
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> I am trying to index JSON objects (which contain nested JSON
> objects
> >>> and
> >>>>>>> Arrays in them) into solr.
> >>>>>>>
> >>>>>>> My JSON Object looks like the following (This is fake data that I
> am
> >>>>>> using
> >>>>>>> for this example):
> >>>>>>>
> >>>>>>> {
> >>>>>>>   "RawEventMessage": "Lorem ipsum dolor sit amet, consectetur
> >>>>>> adipiscing
> >>>>>>> elit. Aliquam dolor orci, placerat ac pretium a, tincidunt
> >>> consectetur
> >>>>>>> mauris. Etiam sollicitudin sapien id odio tempus, non sodales odio
> >>>>>> iaculis.
> >>>>>>> Donec fringilla diam at placerat interdum. Proin vitae arcu non
> augue
> >>>>>>> facilisis auctor id non neque. Integer non nibh sit amet justo
> >>> facilisis
> >>>>>>> semper a vel ligula. Pellentesque commodo vulputate consequat. ",
> >>>>>>>   "EventUid": "1279706565",
> >>>>>>>   "TimeOfEvent": "2015-05-01-08-07-13",
> >>>>>>>   "TimeOfEventUTC": "2015-05-01-01-07-13",
> >>>>>>>   "EventCollector": "kafka",
> >>>>>>>   "EventMessageType": "kafka-@column",
> >>>>>>>   "User": {
> >>>>>>>       "User": "Lorem ipsum",
> >>>>>>>       "UserGroup": "Manager",
> >>>>>>>       "Location": "consectetur adipiscing",
> >>>>>>>       "Department": "Legal"
> >>>>>>>   },
> >>>>>>>   "EventDescription": {
> >>>>>>>       "EventApplicationName": "",
> >>>>>>>       "Query": "SELECT * FROM MOVIES",
> >>>>>>>       "Information": [
> >>>>>>>           {
> >>>>>>>               "domainName": "English",
> >>>>>>>               "columns": [
> >>>>>>>                   {
> >>>>>>>                       "movieName": "Casablanca",
> >>>>>>>                       "duration": "154",
> >>>>>>>                   },
> >>>>>>>   {
> >>>>>>>                       "movieName": "Die Hard",
> >>>>>>>                       "duration": "127",
> >>>>>>>                   }
> >>>>>>>               ]
> >>>>>>>           },
> >>>>>>>           {
> >>>>>>>               "domainName": "Hindi",
> >>>>>>>               "columns": [
> >>>>>>>                   {
> >>>>>>>                       "movieName": "DDLJ",
> >>>>>>>                       "duration": "176",
> >>>>>>>                   }
> >>>>>>>               ]
> >>>>>>>           }
> >>>>>>>       ]
> >>>>>>>   }
> >>>>>>> }
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> My function for indexing the object is as follows:
> >>>>>>>
> >>>>>>> public static void indexJSON(JSONObject jsonOBJ) throws
> >>> ParseException,
> >>>>>>> IOException, SolrServerException {
> >>>>>>>   Collection<SolrInputDocument> batch = new
> >>>>>>> ArrayList<SolrInputDocument>();
> >>>>>>>
> >>>>>>>   SolrInputDocument mainEvent = new SolrInputDocument();
> >>>>>>>   mainEvent.addField("id", generateID());
> >>>>>>>   mainEvent.addField("RawEventMessage",
> >>>>>> jsonOBJ.get("RawEventMessage"));
> >>>>>>>   mainEvent.addField("EventUid", jsonOBJ.get("EventUid"));
> >>>>>>>   mainEvent.addField("EventCollector",
> >>> jsonOBJ.get("EventCollector"));
> >>>>>>>   mainEvent.addField("EventMessageType",
> >>>>>> jsonOBJ.get("EventMessageType"));
> >>>>>>>   mainEvent.addField("TimeOfEvent", jsonOBJ.get("TimeOfEvent"));
> >>>>>>>   mainEvent.addField("TimeOfEventUTC",
> >>> jsonOBJ.get("TimeOfEventUTC"));
> >>>>>>>
> >>>>>>>   Object obj = parser.parse(jsonOBJ.get("User").toString());
> >>>>>>>   JSONObject userObj = (JSONObject) obj;
> >>>>>>>
> >>>>>>>   SolrInputDocument childUserEvent = new SolrInputDocument();
> >>>>>>>   childUserEvent.addField("id", generateID());
> >>>>>>>   childUserEvent.addField("User", userObj.get("User"));
> >>>>>>>
> >>>>>>>   obj = parser.parse(jsonOBJ.get("EventDescription").toString());
> >>>>>>>   JSONObject eventdescriptionObj = (JSONObject) obj;
> >>>>>>>
> >>>>>>>   SolrInputDocument childEventDescEvent = new SolrInputDocument();
> >>>>>>>   childEventDescEvent.addField("id", generateID());
> >>>>>>>   childEventDescEvent.addField("EventApplicationName",
> >>>>>>> eventdescriptionObj.get("EventApplicationName"));
> >>>>>>>   childEventDescEvent.addField("Query",
> >>>>>> eventdescriptionObj.get("Query"));
> >>>>>>>
> >>>>>>>   obj=
> >>>>>> JSONValue.parse(eventdescriptionObj.get("Information").toString());
> >>>>>>>   JSONArray informationArray = (JSONArray) obj;
> >>>>>>>
> >>>>>>>   for(int i = 0; i<informationArray.size(); i++){
> >>>>>>>       JSONObject domain = (JSONObject) informationArray.get(i);
> >>>>>>>
> >>>>>>>       SolrInputDocument domainDoc = new SolrInputDocument();
> >>>>>>>       domainDoc.addField("id", generateID());
> >>>>>>>       domainDoc.addField("domainName", domain.get("domainName"));
> >>>>>>>
> >>>>>>>       String s = domain.get("columns").toString();
> >>>>>>>       obj= JSONValue.parse(s);
> >>>>>>>       JSONArray ColumnsArray = (JSONArray) obj;
> >>>>>>>
> >>>>>>>       SolrInputDocument columnsDoc = new SolrInputDocument();
> >>>>>>>       columnsDoc.addField("id", generateID());
> >>>>>>>
> >>>>>>>       for(int j = 0; j<ColumnsArray.size(); j++){
> >>>>>>>           JSONObject ColumnsObj = (JSONObject) ColumnsArray.get(j);
> >>>>>>>           SolrInputDocument columnDoc = new SolrInputDocument();
> >>>>>>>           columnDoc.addField("id", generateID());
> >>>>>>>           columnDoc.addField("movieName",
> >>> ColumnsObj.get("movieName"));
> >>>>>>>           columnsDoc.addChildDocument(columnDoc);
> >>>>>>>       }
> >>>>>>>       domainDoc.addChildDocument(columnsDoc);
> >>>>>>>       childEventDescEvent.addChildDocument(domainDoc);
> >>>>>>>   }
> >>>>>>>
> >>>>>>>   mainEvent.addChildDocument(childEventDescEvent);
> >>>>>>>   mainEvent.addChildDocument(childUserEvent);
> >>>>>>>   batch.add(mainEvent);
> >>>>>>>   solr.add(batch);
> >>>>>>>   solr.commit();
> >>>>>>> }
> >>>>>>>
> >>>>>>> When I try to index the using the above code, I am able to index
> >>> only 12
> >>>>>>> Objects per second. Is there a faster way to do the indexing? I
> >>> believe I
> >>>>>>> am using the json-fast parser which is one of the fastest parsers
> for
> >>>>>> json.
> >>>>>>>
> >>>>>>> Your help will be very valuable to me.
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Vineeth
> >>>
>

Reply via email to