Sorry, the "ID" mistake was pointed out by Upayavira. Thank you Upayavira!
On Wed, Jul 22, 2015 at 10:56 AM, Vineeth Dasaraju <vineeth.ii...@gmail.com> wrote: > Hi Erick, > > As correctly pointed out by you, the main reason why documents were > disappearing was that I was assigning same id to multiple documents. This > got resolved after I used the UUID as suggested by Mohsen. Thank you for > your inputs. > > Regards, > Vineeth > > On Wed, Jul 22, 2015 at 9:39 AM, Erick Erickson <erickerick...@gmail.com> > wrote: > >> The other classic error is to not send the batch at the end, but >> at a glance that's not a problem for you, after the while loop >> you send the batch that'll catch any docs left over. >> >> solr.user, that might be your problem? Because I've never seen >> this happen. >> >> On Tue, Jul 21, 2015 at 1:47 PM, Fadi Mohsen <fadi.moh...@gmail.com> >> wrote: >> > In Java: UUID.randomUUID(); >> > >> > That is what I'm using. >> > >> > Regards >> > >> >> On 21 Jul 2015, at 22:38, Vineeth Dasaraju <vineeth.ii...@gmail.com> >> wrote: >> >> >> >> Hi Upayavira, >> >> >> >> I guess that is the problem. I am currently using a function for >> generating >> >> an ID. It takes the current date and time to milliseconds and >> generates the >> >> id. This is the function. >> >> >> >> public static String generateID(){ >> >> Date dNow = new Date(); >> >> SimpleDateFormat ft = new SimpleDateFormat("yyMMddhhmmssMs"); >> >> String datetime = ft.format(dNow); >> >> return datetime; >> >> } >> >> >> >> >> >> I believe that despite having a millisecond precision in the id >> generation, >> >> multiple objects are being assigned the same ID. Can you suggest a >> better >> >> way to generate the ID? >> >> >> >> Regards, >> >> Vineeth >> >> >> >> >> >>> On Tue, Jul 21, 2015 at 1:29 PM, Upayavira <u...@odoko.co.uk> wrote: >> >>> >> >>> Are you making sure that every document has a unique ID? Index into an >> >>> empty Solr, then look at your maxdocs vs numdocs. If they are >> different >> >>> (maxdocs is higher) then some of your documents have been deleted, >> >>> meaning some were overwritten. >> >>> >> >>> That might be a place to look. >> >>> >> >>> Upayavira >> >>> >> >>>> On Tue, Jul 21, 2015, at 09:24 PM, solr.user.1...@gmail.com wrote: >> >>>> I can confirm this behavior, seen when sending json docs in batch, >> never >> >>>> happens when sending one by one, but sporadic when sending batches. >> >>>> >> >>>> Like if sole/jetty drops couple of documents out of the batch. >> >>>> >> >>>> Regards >> >>>> >> >>>>> On 21 Jul 2015, at 21:38, Vineeth Dasaraju <vineeth.ii...@gmail.com >> > >> >>> wrote: >> >>>>> >> >>>>> Hi, >> >>>>> >> >>>>> Thank You Erick for your inputs. I tried creating batches of 1000 >> >>> objects >> >>>>> and indexing it to solr. The performance is way better than before >> but >> >>> I >> >>>>> find that number of indexed documents that is shown in the >> dashboard is >> >>>>> lesser than the number of documents that I had actually indexed >> through >> >>>>> solrj. My code is as follows: >> >>>>> >> >>>>> private static String SOLR_SERVER_URL = " >> >>> http://localhost:8983/solr/newcore >> >> >>>>> "; >> >>>>> private static String JSON_FILE_PATH = >> >>> "/home/vineeth/week1_fixed.json"; >> >>>>> private static JSONParser parser = new JSONParser(); >> >>>>> private static SolrClient solr = new >> HttpSolrClient(SOLR_SERVER_URL); >> >>>>> >> >>>>> public static void main(String[] args) throws IOException, >> >>>>> SolrServerException, ParseException { >> >>>>> File file = new File(JSON_FILE_PATH); >> >>>>> Scanner scn=new Scanner(file,"UTF-8"); >> >>>>> JSONObject object; >> >>>>> int i = 0; >> >>>>> Collection<SolrInputDocument> batch = new >> >>>>> ArrayList<SolrInputDocument>(); >> >>>>> while(scn.hasNext()){ >> >>>>> object= (JSONObject) parser.parse(scn.nextLine()); >> >>>>> SolrInputDocument doc = indexJSON(object); >> >>>>> batch.add(doc); >> >>>>> if(i%1000==0){ >> >>>>> System.out.println("Indexed " + (i+1) + " objects." ); >> >>>>> solr.add(batch); >> >>>>> batch = new ArrayList<SolrInputDocument>(); >> >>>>> } >> >>>>> i++; >> >>>>> } >> >>>>> solr.add(batch); >> >>>>> solr.commit(); >> >>>>> System.out.println("Indexed " + (i+1) + " objects." ); >> >>>>> } >> >>>>> >> >>>>> public static SolrInputDocument indexJSON(JSONObject jsonOBJ) throws >> >>>>> ParseException, IOException, SolrServerException { >> >>>>> Collection<SolrInputDocument> batch = new >> >>>>> ArrayList<SolrInputDocument>(); >> >>>>> >> >>>>> SolrInputDocument mainEvent = new SolrInputDocument(); >> >>>>> mainEvent.addField("id", generateID()); >> >>>>> mainEvent.addField("RawEventMessage", >> >>> jsonOBJ.get("RawEventMessage")); >> >>>>> mainEvent.addField("EventUid", jsonOBJ.get("EventUid")); >> >>>>> mainEvent.addField("EventCollector", >> jsonOBJ.get("EventCollector")); >> >>>>> mainEvent.addField("EventMessageType", >> >>> jsonOBJ.get("EventMessageType")); >> >>>>> mainEvent.addField("TimeOfEvent", jsonOBJ.get("TimeOfEvent")); >> >>>>> mainEvent.addField("TimeOfEventUTC", >> jsonOBJ.get("TimeOfEventUTC")); >> >>>>> >> >>>>> Object obj = parser.parse(jsonOBJ.get("User").toString()); >> >>>>> JSONObject userObj = (JSONObject) obj; >> >>>>> >> >>>>> SolrInputDocument childUserEvent = new SolrInputDocument(); >> >>>>> childUserEvent.addField("id", generateID()); >> >>>>> childUserEvent.addField("User", userObj.get("User")); >> >>>>> >> >>>>> obj = parser.parse(jsonOBJ.get("EventDescription").toString()); >> >>>>> JSONObject eventdescriptionObj = (JSONObject) obj; >> >>>>> >> >>>>> SolrInputDocument childEventDescEvent = new SolrInputDocument(); >> >>>>> childEventDescEvent.addField("id", generateID()); >> >>>>> childEventDescEvent.addField("EventApplicationName", >> >>>>> eventdescriptionObj.get("EventApplicationName")); >> >>>>> childEventDescEvent.addField("Query", >> >>> eventdescriptionObj.get("Query")); >> >>>>> >> >>>>> obj= >> >>> JSONValue.parse(eventdescriptionObj.get("Information").toString()); >> >>>>> JSONArray informationArray = (JSONArray) obj; >> >>>>> >> >>>>> for(int i = 0; i<informationArray.size(); i++){ >> >>>>> JSONObject domain = (JSONObject) informationArray.get(i); >> >>>>> >> >>>>> SolrInputDocument domainDoc = new SolrInputDocument(); >> >>>>> domainDoc.addField("id", generateID()); >> >>>>> domainDoc.addField("domainName", domain.get("domainName")); >> >>>>> >> >>>>> String s = domain.get("columns").toString(); >> >>>>> obj= JSONValue.parse(s); >> >>>>> JSONArray ColumnsArray = (JSONArray) obj; >> >>>>> >> >>>>> SolrInputDocument columnsDoc = new SolrInputDocument(); >> >>>>> columnsDoc.addField("id", generateID()); >> >>>>> >> >>>>> for(int j = 0; j<ColumnsArray.size(); j++){ >> >>>>> JSONObject ColumnsObj = (JSONObject) ColumnsArray.get(j); >> >>>>> SolrInputDocument columnDoc = new SolrInputDocument(); >> >>>>> columnDoc.addField("id", generateID()); >> >>>>> columnDoc.addField("movieName", >> >>> ColumnsObj.get("movieName")); >> >>>>> columnsDoc.addChildDocument(columnDoc); >> >>>>> } >> >>>>> domainDoc.addChildDocument(columnsDoc); >> >>>>> childEventDescEvent.addChildDocument(domainDoc); >> >>>>> } >> >>>>> >> >>>>> mainEvent.addChildDocument(childEventDescEvent); >> >>>>> mainEvent.addChildDocument(childUserEvent); >> >>>>> return mainEvent; >> >>>>> } >> >>>>> >> >>>>> I would be grateful if you could let me know what I am missing. >> >>>>> >> >>>>> On Sun, Jul 19, 2015 at 2:16 PM, Erick Erickson < >> >>> erickerick...@gmail.com> >> >>>>> wrote: >> >>>>> >> >>>>>> First thing is it looks like you're only sending one document at a >> >>>>>> time, perhaps with child objects. This is not optimal at all. I >> >>>>>> usually batch my docs up in groups of 1,000, and there is anecdotal >> >>>>>> evidence that there may (depending on the docs) be some gains above >> >>>>>> that number. Gotta balance the batch size off against how bug the >> docs >> >>>>>> are of course. >> >>>>>> >> >>>>>> Assuming that you really are calling this method for one doc (and >> >>>>>> children) at a time, the far bigger problem other than calling >> >>>>>> server.add for each parent/children is that you're then calling >> >>>>>> solr.commit() every time. This is an anti-pattern. Generally, let >> the >> >>>>>> autoCommit setting in solrconfig.xml handle the intermediate >> commits >> >>>>>> while the indexing program is running and only issue a commit at >> the >> >>>>>> very end of the job if at all. >> >>>>>> >> >>>>>> Best, >> >>>>>> Erick >> >>>>>> >> >>>>>> On Sun, Jul 19, 2015 at 12:08 PM, Vineeth Dasaraju >> >>>>>> <vineeth.ii...@gmail.com> wrote: >> >>>>>>> Hi, >> >>>>>>> >> >>>>>>> I am trying to index JSON objects (which contain nested JSON >> objects >> >>> and >> >>>>>>> Arrays in them) into solr. >> >>>>>>> >> >>>>>>> My JSON Object looks like the following (This is fake data that I >> am >> >>>>>> using >> >>>>>>> for this example): >> >>>>>>> >> >>>>>>> { >> >>>>>>> "RawEventMessage": "Lorem ipsum dolor sit amet, consectetur >> >>>>>> adipiscing >> >>>>>>> elit. Aliquam dolor orci, placerat ac pretium a, tincidunt >> >>> consectetur >> >>>>>>> mauris. Etiam sollicitudin sapien id odio tempus, non sodales odio >> >>>>>> iaculis. >> >>>>>>> Donec fringilla diam at placerat interdum. Proin vitae arcu non >> augue >> >>>>>>> facilisis auctor id non neque. Integer non nibh sit amet justo >> >>> facilisis >> >>>>>>> semper a vel ligula. Pellentesque commodo vulputate consequat. ", >> >>>>>>> "EventUid": "1279706565", >> >>>>>>> "TimeOfEvent": "2015-05-01-08-07-13", >> >>>>>>> "TimeOfEventUTC": "2015-05-01-01-07-13", >> >>>>>>> "EventCollector": "kafka", >> >>>>>>> "EventMessageType": "kafka-@column", >> >>>>>>> "User": { >> >>>>>>> "User": "Lorem ipsum", >> >>>>>>> "UserGroup": "Manager", >> >>>>>>> "Location": "consectetur adipiscing", >> >>>>>>> "Department": "Legal" >> >>>>>>> }, >> >>>>>>> "EventDescription": { >> >>>>>>> "EventApplicationName": "", >> >>>>>>> "Query": "SELECT * FROM MOVIES", >> >>>>>>> "Information": [ >> >>>>>>> { >> >>>>>>> "domainName": "English", >> >>>>>>> "columns": [ >> >>>>>>> { >> >>>>>>> "movieName": "Casablanca", >> >>>>>>> "duration": "154", >> >>>>>>> }, >> >>>>>>> { >> >>>>>>> "movieName": "Die Hard", >> >>>>>>> "duration": "127", >> >>>>>>> } >> >>>>>>> ] >> >>>>>>> }, >> >>>>>>> { >> >>>>>>> "domainName": "Hindi", >> >>>>>>> "columns": [ >> >>>>>>> { >> >>>>>>> "movieName": "DDLJ", >> >>>>>>> "duration": "176", >> >>>>>>> } >> >>>>>>> ] >> >>>>>>> } >> >>>>>>> ] >> >>>>>>> } >> >>>>>>> } >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> My function for indexing the object is as follows: >> >>>>>>> >> >>>>>>> public static void indexJSON(JSONObject jsonOBJ) throws >> >>> ParseException, >> >>>>>>> IOException, SolrServerException { >> >>>>>>> Collection<SolrInputDocument> batch = new >> >>>>>>> ArrayList<SolrInputDocument>(); >> >>>>>>> >> >>>>>>> SolrInputDocument mainEvent = new SolrInputDocument(); >> >>>>>>> mainEvent.addField("id", generateID()); >> >>>>>>> mainEvent.addField("RawEventMessage", >> >>>>>> jsonOBJ.get("RawEventMessage")); >> >>>>>>> mainEvent.addField("EventUid", jsonOBJ.get("EventUid")); >> >>>>>>> mainEvent.addField("EventCollector", >> >>> jsonOBJ.get("EventCollector")); >> >>>>>>> mainEvent.addField("EventMessageType", >> >>>>>> jsonOBJ.get("EventMessageType")); >> >>>>>>> mainEvent.addField("TimeOfEvent", jsonOBJ.get("TimeOfEvent")); >> >>>>>>> mainEvent.addField("TimeOfEventUTC", >> >>> jsonOBJ.get("TimeOfEventUTC")); >> >>>>>>> >> >>>>>>> Object obj = parser.parse(jsonOBJ.get("User").toString()); >> >>>>>>> JSONObject userObj = (JSONObject) obj; >> >>>>>>> >> >>>>>>> SolrInputDocument childUserEvent = new SolrInputDocument(); >> >>>>>>> childUserEvent.addField("id", generateID()); >> >>>>>>> childUserEvent.addField("User", userObj.get("User")); >> >>>>>>> >> >>>>>>> obj = parser.parse(jsonOBJ.get("EventDescription").toString()); >> >>>>>>> JSONObject eventdescriptionObj = (JSONObject) obj; >> >>>>>>> >> >>>>>>> SolrInputDocument childEventDescEvent = new SolrInputDocument(); >> >>>>>>> childEventDescEvent.addField("id", generateID()); >> >>>>>>> childEventDescEvent.addField("EventApplicationName", >> >>>>>>> eventdescriptionObj.get("EventApplicationName")); >> >>>>>>> childEventDescEvent.addField("Query", >> >>>>>> eventdescriptionObj.get("Query")); >> >>>>>>> >> >>>>>>> obj= >> >>>>>> JSONValue.parse(eventdescriptionObj.get("Information").toString()); >> >>>>>>> JSONArray informationArray = (JSONArray) obj; >> >>>>>>> >> >>>>>>> for(int i = 0; i<informationArray.size(); i++){ >> >>>>>>> JSONObject domain = (JSONObject) informationArray.get(i); >> >>>>>>> >> >>>>>>> SolrInputDocument domainDoc = new SolrInputDocument(); >> >>>>>>> domainDoc.addField("id", generateID()); >> >>>>>>> domainDoc.addField("domainName", domain.get("domainName")); >> >>>>>>> >> >>>>>>> String s = domain.get("columns").toString(); >> >>>>>>> obj= JSONValue.parse(s); >> >>>>>>> JSONArray ColumnsArray = (JSONArray) obj; >> >>>>>>> >> >>>>>>> SolrInputDocument columnsDoc = new SolrInputDocument(); >> >>>>>>> columnsDoc.addField("id", generateID()); >> >>>>>>> >> >>>>>>> for(int j = 0; j<ColumnsArray.size(); j++){ >> >>>>>>> JSONObject ColumnsObj = (JSONObject) >> ColumnsArray.get(j); >> >>>>>>> SolrInputDocument columnDoc = new SolrInputDocument(); >> >>>>>>> columnDoc.addField("id", generateID()); >> >>>>>>> columnDoc.addField("movieName", >> >>> ColumnsObj.get("movieName")); >> >>>>>>> columnsDoc.addChildDocument(columnDoc); >> >>>>>>> } >> >>>>>>> domainDoc.addChildDocument(columnsDoc); >> >>>>>>> childEventDescEvent.addChildDocument(domainDoc); >> >>>>>>> } >> >>>>>>> >> >>>>>>> mainEvent.addChildDocument(childEventDescEvent); >> >>>>>>> mainEvent.addChildDocument(childUserEvent); >> >>>>>>> batch.add(mainEvent); >> >>>>>>> solr.add(batch); >> >>>>>>> solr.commit(); >> >>>>>>> } >> >>>>>>> >> >>>>>>> When I try to index the using the above code, I am able to index >> >>> only 12 >> >>>>>>> Objects per second. Is there a faster way to do the indexing? I >> >>> believe I >> >>>>>>> am using the json-fast parser which is one of the fastest parsers >> for >> >>>>>> json. >> >>>>>>> >> >>>>>>> Your help will be very valuable to me. >> >>>>>>> >> >>>>>>> Thanks, >> >>>>>>> Vineeth >> >>> >> > >