Sorry, the "ID" mistake was pointed out by Upayavira. Thank you Upayavira!

On Wed, Jul 22, 2015 at 10:56 AM, Vineeth Dasaraju <vineeth.ii...@gmail.com>
wrote:

> Hi Erick,
>
> As correctly pointed out by you, the main reason why documents were
> disappearing was that I was assigning same id to multiple documents. This
> got resolved after I used the UUID as suggested by Mohsen. Thank you for
> your inputs.
>
> Regards,
> Vineeth
>
> On Wed, Jul 22, 2015 at 9:39 AM, Erick Erickson <erickerick...@gmail.com>
> wrote:
>
>> The other classic error  is to not send the batch at the end, but
>> at a glance that's not a problem for you, after the while loop
>> you send the batch that'll catch any docs left over.
>>
>> solr.user, that might be your problem? Because I've never seen
>> this happen.
>>
>> On Tue, Jul 21, 2015 at 1:47 PM, Fadi Mohsen <fadi.moh...@gmail.com>
>> wrote:
>> > In Java: UUID.randomUUID();
>> >
>> > That is what I'm using.
>> >
>> > Regards
>> >
>> >> On 21 Jul 2015, at 22:38, Vineeth Dasaraju <vineeth.ii...@gmail.com>
>> wrote:
>> >>
>> >> Hi Upayavira,
>> >>
>> >> I guess that is the problem. I am currently using a function for
>> generating
>> >> an ID. It takes the current date and time to milliseconds and
>> generates the
>> >> id. This is the function.
>> >>
>> >> public static String generateID(){
>> >>        Date dNow = new Date();
>> >>        SimpleDateFormat ft = new SimpleDateFormat("yyMMddhhmmssMs");
>> >>        String datetime = ft.format(dNow);
>> >>        return datetime;
>> >>    }
>> >>
>> >>
>> >> I believe that despite having a millisecond precision in the id
>> generation,
>> >> multiple objects are being assigned the same ID. Can you suggest a
>> better
>> >> way to generate the ID?
>> >>
>> >> Regards,
>> >> Vineeth
>> >>
>> >>
>> >>> On Tue, Jul 21, 2015 at 1:29 PM, Upayavira <u...@odoko.co.uk> wrote:
>> >>>
>> >>> Are you making sure that every document has a unique ID? Index into an
>> >>> empty Solr, then look at your maxdocs vs numdocs. If they are
>> different
>> >>> (maxdocs is higher) then some of your documents have been deleted,
>> >>> meaning some were overwritten.
>> >>>
>> >>> That might be a place to look.
>> >>>
>> >>> Upayavira
>> >>>
>> >>>> On Tue, Jul 21, 2015, at 09:24 PM, solr.user.1...@gmail.com wrote:
>> >>>> I can confirm this behavior, seen when sending json docs in batch,
>> never
>> >>>> happens when sending one by one, but sporadic when sending batches.
>> >>>>
>> >>>> Like if sole/jetty drops couple of documents out of the batch.
>> >>>>
>> >>>> Regards
>> >>>>
>> >>>>> On 21 Jul 2015, at 21:38, Vineeth Dasaraju <vineeth.ii...@gmail.com
>> >
>> >>> wrote:
>> >>>>>
>> >>>>> Hi,
>> >>>>>
>> >>>>> Thank You Erick for your inputs. I tried creating batches of 1000
>> >>> objects
>> >>>>> and indexing it to solr. The performance is way better than before
>> but
>> >>> I
>> >>>>> find that number of indexed documents that is shown in the
>> dashboard is
>> >>>>> lesser than the number of documents that I had actually indexed
>> through
>> >>>>> solrj. My code is as follows:
>> >>>>>
>> >>>>> private static String SOLR_SERVER_URL = "
>> >>> http://localhost:8983/solr/newcore
>>
>> >>>>> ";
>> >>>>> private static String JSON_FILE_PATH =
>> >>> "/home/vineeth/week1_fixed.json";
>> >>>>> private static JSONParser parser = new JSONParser();
>> >>>>> private static SolrClient solr = new
>> HttpSolrClient(SOLR_SERVER_URL);
>> >>>>>
>> >>>>> public static void main(String[] args) throws IOException,
>> >>>>> SolrServerException, ParseException {
>> >>>>>       File file = new File(JSON_FILE_PATH);
>> >>>>>       Scanner scn=new Scanner(file,"UTF-8");
>> >>>>>       JSONObject object;
>> >>>>>       int i = 0;
>> >>>>>       Collection<SolrInputDocument> batch = new
>> >>>>> ArrayList<SolrInputDocument>();
>> >>>>>       while(scn.hasNext()){
>> >>>>>           object= (JSONObject) parser.parse(scn.nextLine());
>> >>>>>           SolrInputDocument doc = indexJSON(object);
>> >>>>>           batch.add(doc);
>> >>>>>           if(i%1000==0){
>> >>>>>               System.out.println("Indexed " + (i+1) + " objects." );
>> >>>>>               solr.add(batch);
>> >>>>>               batch = new ArrayList<SolrInputDocument>();
>> >>>>>           }
>> >>>>>           i++;
>> >>>>>       }
>> >>>>>       solr.add(batch);
>> >>>>>       solr.commit();
>> >>>>>       System.out.println("Indexed " + (i+1) + " objects." );
>> >>>>> }
>> >>>>>
>> >>>>> public static SolrInputDocument indexJSON(JSONObject jsonOBJ) throws
>> >>>>> ParseException, IOException, SolrServerException {
>> >>>>>   Collection<SolrInputDocument> batch = new
>> >>>>> ArrayList<SolrInputDocument>();
>> >>>>>
>> >>>>>   SolrInputDocument mainEvent = new SolrInputDocument();
>> >>>>>   mainEvent.addField("id", generateID());
>> >>>>>   mainEvent.addField("RawEventMessage",
>> >>> jsonOBJ.get("RawEventMessage"));
>> >>>>>   mainEvent.addField("EventUid", jsonOBJ.get("EventUid"));
>> >>>>>   mainEvent.addField("EventCollector",
>> jsonOBJ.get("EventCollector"));
>> >>>>>   mainEvent.addField("EventMessageType",
>> >>> jsonOBJ.get("EventMessageType"));
>> >>>>>   mainEvent.addField("TimeOfEvent", jsonOBJ.get("TimeOfEvent"));
>> >>>>>   mainEvent.addField("TimeOfEventUTC",
>> jsonOBJ.get("TimeOfEventUTC"));
>> >>>>>
>> >>>>>   Object obj = parser.parse(jsonOBJ.get("User").toString());
>> >>>>>   JSONObject userObj = (JSONObject) obj;
>> >>>>>
>> >>>>>   SolrInputDocument childUserEvent = new SolrInputDocument();
>> >>>>>   childUserEvent.addField("id", generateID());
>> >>>>>   childUserEvent.addField("User", userObj.get("User"));
>> >>>>>
>> >>>>>   obj = parser.parse(jsonOBJ.get("EventDescription").toString());
>> >>>>>   JSONObject eventdescriptionObj = (JSONObject) obj;
>> >>>>>
>> >>>>>   SolrInputDocument childEventDescEvent = new SolrInputDocument();
>> >>>>>   childEventDescEvent.addField("id", generateID());
>> >>>>>   childEventDescEvent.addField("EventApplicationName",
>> >>>>> eventdescriptionObj.get("EventApplicationName"));
>> >>>>>   childEventDescEvent.addField("Query",
>> >>> eventdescriptionObj.get("Query"));
>> >>>>>
>> >>>>>   obj=
>> >>> JSONValue.parse(eventdescriptionObj.get("Information").toString());
>> >>>>>   JSONArray informationArray = (JSONArray) obj;
>> >>>>>
>> >>>>>   for(int i = 0; i<informationArray.size(); i++){
>> >>>>>       JSONObject domain = (JSONObject) informationArray.get(i);
>> >>>>>
>> >>>>>       SolrInputDocument domainDoc = new SolrInputDocument();
>> >>>>>       domainDoc.addField("id", generateID());
>> >>>>>       domainDoc.addField("domainName", domain.get("domainName"));
>> >>>>>
>> >>>>>       String s = domain.get("columns").toString();
>> >>>>>       obj= JSONValue.parse(s);
>> >>>>>       JSONArray ColumnsArray = (JSONArray) obj;
>> >>>>>
>> >>>>>       SolrInputDocument columnsDoc = new SolrInputDocument();
>> >>>>>       columnsDoc.addField("id", generateID());
>> >>>>>
>> >>>>>       for(int j = 0; j<ColumnsArray.size(); j++){
>> >>>>>           JSONObject ColumnsObj = (JSONObject) ColumnsArray.get(j);
>> >>>>>           SolrInputDocument columnDoc = new SolrInputDocument();
>> >>>>>           columnDoc.addField("id", generateID());
>> >>>>>           columnDoc.addField("movieName",
>> >>> ColumnsObj.get("movieName"));
>> >>>>>           columnsDoc.addChildDocument(columnDoc);
>> >>>>>       }
>> >>>>>       domainDoc.addChildDocument(columnsDoc);
>> >>>>>       childEventDescEvent.addChildDocument(domainDoc);
>> >>>>>   }
>> >>>>>
>> >>>>>   mainEvent.addChildDocument(childEventDescEvent);
>> >>>>>   mainEvent.addChildDocument(childUserEvent);
>> >>>>>   return mainEvent;
>> >>>>> }
>> >>>>>
>> >>>>> I would be grateful if you could let me know what I am missing.
>> >>>>>
>> >>>>> On Sun, Jul 19, 2015 at 2:16 PM, Erick Erickson <
>> >>> erickerick...@gmail.com>
>> >>>>> wrote:
>> >>>>>
>> >>>>>> First thing is it looks like you're only sending one document at a
>> >>>>>> time, perhaps with child objects. This is not optimal at all. I
>> >>>>>> usually batch my docs up in groups of 1,000, and there is anecdotal
>> >>>>>> evidence that there may (depending on the docs) be some gains above
>> >>>>>> that number. Gotta balance the batch size off against how bug the
>> docs
>> >>>>>> are of course.
>> >>>>>>
>> >>>>>> Assuming that you really are calling this method for one doc (and
>> >>>>>> children) at a time, the far bigger problem other than calling
>> >>>>>> server.add for each parent/children is that you're then calling
>> >>>>>> solr.commit() every time. This is an anti-pattern. Generally, let
>> the
>> >>>>>> autoCommit setting in solrconfig.xml handle the intermediate
>> commits
>> >>>>>> while the indexing program is running and only issue a commit at
>> the
>> >>>>>> very end of the job if at all.
>> >>>>>>
>> >>>>>> Best,
>> >>>>>> Erick
>> >>>>>>
>> >>>>>> On Sun, Jul 19, 2015 at 12:08 PM, Vineeth Dasaraju
>> >>>>>> <vineeth.ii...@gmail.com> wrote:
>> >>>>>>> Hi,
>> >>>>>>>
>> >>>>>>> I am trying to index JSON objects (which contain nested JSON
>> objects
>> >>> and
>> >>>>>>> Arrays in them) into solr.
>> >>>>>>>
>> >>>>>>> My JSON Object looks like the following (This is fake data that I
>> am
>> >>>>>> using
>> >>>>>>> for this example):
>> >>>>>>>
>> >>>>>>> {
>> >>>>>>>   "RawEventMessage": "Lorem ipsum dolor sit amet, consectetur
>> >>>>>> adipiscing
>> >>>>>>> elit. Aliquam dolor orci, placerat ac pretium a, tincidunt
>> >>> consectetur
>> >>>>>>> mauris. Etiam sollicitudin sapien id odio tempus, non sodales odio
>> >>>>>> iaculis.
>> >>>>>>> Donec fringilla diam at placerat interdum. Proin vitae arcu non
>> augue
>> >>>>>>> facilisis auctor id non neque. Integer non nibh sit amet justo
>> >>> facilisis
>> >>>>>>> semper a vel ligula. Pellentesque commodo vulputate consequat. ",
>> >>>>>>>   "EventUid": "1279706565",
>> >>>>>>>   "TimeOfEvent": "2015-05-01-08-07-13",
>> >>>>>>>   "TimeOfEventUTC": "2015-05-01-01-07-13",
>> >>>>>>>   "EventCollector": "kafka",
>> >>>>>>>   "EventMessageType": "kafka-@column",
>> >>>>>>>   "User": {
>> >>>>>>>       "User": "Lorem ipsum",
>> >>>>>>>       "UserGroup": "Manager",
>> >>>>>>>       "Location": "consectetur adipiscing",
>> >>>>>>>       "Department": "Legal"
>> >>>>>>>   },
>> >>>>>>>   "EventDescription": {
>> >>>>>>>       "EventApplicationName": "",
>> >>>>>>>       "Query": "SELECT * FROM MOVIES",
>> >>>>>>>       "Information": [
>> >>>>>>>           {
>> >>>>>>>               "domainName": "English",
>> >>>>>>>               "columns": [
>> >>>>>>>                   {
>> >>>>>>>                       "movieName": "Casablanca",
>> >>>>>>>                       "duration": "154",
>> >>>>>>>                   },
>> >>>>>>>   {
>> >>>>>>>                       "movieName": "Die Hard",
>> >>>>>>>                       "duration": "127",
>> >>>>>>>                   }
>> >>>>>>>               ]
>> >>>>>>>           },
>> >>>>>>>           {
>> >>>>>>>               "domainName": "Hindi",
>> >>>>>>>               "columns": [
>> >>>>>>>                   {
>> >>>>>>>                       "movieName": "DDLJ",
>> >>>>>>>                       "duration": "176",
>> >>>>>>>                   }
>> >>>>>>>               ]
>> >>>>>>>           }
>> >>>>>>>       ]
>> >>>>>>>   }
>> >>>>>>> }
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> My function for indexing the object is as follows:
>> >>>>>>>
>> >>>>>>> public static void indexJSON(JSONObject jsonOBJ) throws
>> >>> ParseException,
>> >>>>>>> IOException, SolrServerException {
>> >>>>>>>   Collection<SolrInputDocument> batch = new
>> >>>>>>> ArrayList<SolrInputDocument>();
>> >>>>>>>
>> >>>>>>>   SolrInputDocument mainEvent = new SolrInputDocument();
>> >>>>>>>   mainEvent.addField("id", generateID());
>> >>>>>>>   mainEvent.addField("RawEventMessage",
>> >>>>>> jsonOBJ.get("RawEventMessage"));
>> >>>>>>>   mainEvent.addField("EventUid", jsonOBJ.get("EventUid"));
>> >>>>>>>   mainEvent.addField("EventCollector",
>> >>> jsonOBJ.get("EventCollector"));
>> >>>>>>>   mainEvent.addField("EventMessageType",
>> >>>>>> jsonOBJ.get("EventMessageType"));
>> >>>>>>>   mainEvent.addField("TimeOfEvent", jsonOBJ.get("TimeOfEvent"));
>> >>>>>>>   mainEvent.addField("TimeOfEventUTC",
>> >>> jsonOBJ.get("TimeOfEventUTC"));
>> >>>>>>>
>> >>>>>>>   Object obj = parser.parse(jsonOBJ.get("User").toString());
>> >>>>>>>   JSONObject userObj = (JSONObject) obj;
>> >>>>>>>
>> >>>>>>>   SolrInputDocument childUserEvent = new SolrInputDocument();
>> >>>>>>>   childUserEvent.addField("id", generateID());
>> >>>>>>>   childUserEvent.addField("User", userObj.get("User"));
>> >>>>>>>
>> >>>>>>>   obj = parser.parse(jsonOBJ.get("EventDescription").toString());
>> >>>>>>>   JSONObject eventdescriptionObj = (JSONObject) obj;
>> >>>>>>>
>> >>>>>>>   SolrInputDocument childEventDescEvent = new SolrInputDocument();
>> >>>>>>>   childEventDescEvent.addField("id", generateID());
>> >>>>>>>   childEventDescEvent.addField("EventApplicationName",
>> >>>>>>> eventdescriptionObj.get("EventApplicationName"));
>> >>>>>>>   childEventDescEvent.addField("Query",
>> >>>>>> eventdescriptionObj.get("Query"));
>> >>>>>>>
>> >>>>>>>   obj=
>> >>>>>> JSONValue.parse(eventdescriptionObj.get("Information").toString());
>> >>>>>>>   JSONArray informationArray = (JSONArray) obj;
>> >>>>>>>
>> >>>>>>>   for(int i = 0; i<informationArray.size(); i++){
>> >>>>>>>       JSONObject domain = (JSONObject) informationArray.get(i);
>> >>>>>>>
>> >>>>>>>       SolrInputDocument domainDoc = new SolrInputDocument();
>> >>>>>>>       domainDoc.addField("id", generateID());
>> >>>>>>>       domainDoc.addField("domainName", domain.get("domainName"));
>> >>>>>>>
>> >>>>>>>       String s = domain.get("columns").toString();
>> >>>>>>>       obj= JSONValue.parse(s);
>> >>>>>>>       JSONArray ColumnsArray = (JSONArray) obj;
>> >>>>>>>
>> >>>>>>>       SolrInputDocument columnsDoc = new SolrInputDocument();
>> >>>>>>>       columnsDoc.addField("id", generateID());
>> >>>>>>>
>> >>>>>>>       for(int j = 0; j<ColumnsArray.size(); j++){
>> >>>>>>>           JSONObject ColumnsObj = (JSONObject)
>> ColumnsArray.get(j);
>> >>>>>>>           SolrInputDocument columnDoc = new SolrInputDocument();
>> >>>>>>>           columnDoc.addField("id", generateID());
>> >>>>>>>           columnDoc.addField("movieName",
>> >>> ColumnsObj.get("movieName"));
>> >>>>>>>           columnsDoc.addChildDocument(columnDoc);
>> >>>>>>>       }
>> >>>>>>>       domainDoc.addChildDocument(columnsDoc);
>> >>>>>>>       childEventDescEvent.addChildDocument(domainDoc);
>> >>>>>>>   }
>> >>>>>>>
>> >>>>>>>   mainEvent.addChildDocument(childEventDescEvent);
>> >>>>>>>   mainEvent.addChildDocument(childUserEvent);
>> >>>>>>>   batch.add(mainEvent);
>> >>>>>>>   solr.add(batch);
>> >>>>>>>   solr.commit();
>> >>>>>>> }
>> >>>>>>>
>> >>>>>>> When I try to index the using the above code, I am able to index
>> >>> only 12
>> >>>>>>> Objects per second. Is there a faster way to do the indexing? I
>> >>> believe I
>> >>>>>>> am using the json-fast parser which is one of the fastest parsers
>> for
>> >>>>>> json.
>> >>>>>>>
>> >>>>>>> Your help will be very valuable to me.
>> >>>>>>>
>> >>>>>>> Thanks,
>> >>>>>>> Vineeth
>> >>>
>>
>
>

Reply via email to