Hi Erick, As correctly pointed out by you, the main reason why documents were disappearing was that I was assigning same id to multiple documents. This got resolved after I used the UUID as suggested by Mohsen. Thank you for your inputs.
Regards, Vineeth On Wed, Jul 22, 2015 at 9:39 AM, Erick Erickson <erickerick...@gmail.com> wrote: > The other classic error is to not send the batch at the end, but > at a glance that's not a problem for you, after the while loop > you send the batch that'll catch any docs left over. > > solr.user, that might be your problem? Because I've never seen > this happen. > > On Tue, Jul 21, 2015 at 1:47 PM, Fadi Mohsen <fadi.moh...@gmail.com> > wrote: > > In Java: UUID.randomUUID(); > > > > That is what I'm using. > > > > Regards > > > >> On 21 Jul 2015, at 22:38, Vineeth Dasaraju <vineeth.ii...@gmail.com> > wrote: > >> > >> Hi Upayavira, > >> > >> I guess that is the problem. I am currently using a function for > generating > >> an ID. It takes the current date and time to milliseconds and generates > the > >> id. This is the function. > >> > >> public static String generateID(){ > >> Date dNow = new Date(); > >> SimpleDateFormat ft = new SimpleDateFormat("yyMMddhhmmssMs"); > >> String datetime = ft.format(dNow); > >> return datetime; > >> } > >> > >> > >> I believe that despite having a millisecond precision in the id > generation, > >> multiple objects are being assigned the same ID. Can you suggest a > better > >> way to generate the ID? > >> > >> Regards, > >> Vineeth > >> > >> > >>> On Tue, Jul 21, 2015 at 1:29 PM, Upayavira <u...@odoko.co.uk> wrote: > >>> > >>> Are you making sure that every document has a unique ID? Index into an > >>> empty Solr, then look at your maxdocs vs numdocs. If they are different > >>> (maxdocs is higher) then some of your documents have been deleted, > >>> meaning some were overwritten. > >>> > >>> That might be a place to look. > >>> > >>> Upayavira > >>> > >>>> On Tue, Jul 21, 2015, at 09:24 PM, solr.user.1...@gmail.com wrote: > >>>> I can confirm this behavior, seen when sending json docs in batch, > never > >>>> happens when sending one by one, but sporadic when sending batches. > >>>> > >>>> Like if sole/jetty drops couple of documents out of the batch. > >>>> > >>>> Regards > >>>> > >>>>> On 21 Jul 2015, at 21:38, Vineeth Dasaraju <vineeth.ii...@gmail.com> > >>> wrote: > >>>>> > >>>>> Hi, > >>>>> > >>>>> Thank You Erick for your inputs. I tried creating batches of 1000 > >>> objects > >>>>> and indexing it to solr. The performance is way better than before > but > >>> I > >>>>> find that number of indexed documents that is shown in the dashboard > is > >>>>> lesser than the number of documents that I had actually indexed > through > >>>>> solrj. My code is as follows: > >>>>> > >>>>> private static String SOLR_SERVER_URL = " > >>> http://localhost:8983/solr/newcore > >>>>> "; > >>>>> private static String JSON_FILE_PATH = > >>> "/home/vineeth/week1_fixed.json"; > >>>>> private static JSONParser parser = new JSONParser(); > >>>>> private static SolrClient solr = new HttpSolrClient(SOLR_SERVER_URL); > >>>>> > >>>>> public static void main(String[] args) throws IOException, > >>>>> SolrServerException, ParseException { > >>>>> File file = new File(JSON_FILE_PATH); > >>>>> Scanner scn=new Scanner(file,"UTF-8"); > >>>>> JSONObject object; > >>>>> int i = 0; > >>>>> Collection<SolrInputDocument> batch = new > >>>>> ArrayList<SolrInputDocument>(); > >>>>> while(scn.hasNext()){ > >>>>> object= (JSONObject) parser.parse(scn.nextLine()); > >>>>> SolrInputDocument doc = indexJSON(object); > >>>>> batch.add(doc); > >>>>> if(i%1000==0){ > >>>>> System.out.println("Indexed " + (i+1) + " objects." ); > >>>>> solr.add(batch); > >>>>> batch = new ArrayList<SolrInputDocument>(); > >>>>> } > >>>>> i++; > >>>>> } > >>>>> solr.add(batch); > >>>>> solr.commit(); > >>>>> System.out.println("Indexed " + (i+1) + " objects." ); > >>>>> } > >>>>> > >>>>> public static SolrInputDocument indexJSON(JSONObject jsonOBJ) throws > >>>>> ParseException, IOException, SolrServerException { > >>>>> Collection<SolrInputDocument> batch = new > >>>>> ArrayList<SolrInputDocument>(); > >>>>> > >>>>> SolrInputDocument mainEvent = new SolrInputDocument(); > >>>>> mainEvent.addField("id", generateID()); > >>>>> mainEvent.addField("RawEventMessage", > >>> jsonOBJ.get("RawEventMessage")); > >>>>> mainEvent.addField("EventUid", jsonOBJ.get("EventUid")); > >>>>> mainEvent.addField("EventCollector", > jsonOBJ.get("EventCollector")); > >>>>> mainEvent.addField("EventMessageType", > >>> jsonOBJ.get("EventMessageType")); > >>>>> mainEvent.addField("TimeOfEvent", jsonOBJ.get("TimeOfEvent")); > >>>>> mainEvent.addField("TimeOfEventUTC", > jsonOBJ.get("TimeOfEventUTC")); > >>>>> > >>>>> Object obj = parser.parse(jsonOBJ.get("User").toString()); > >>>>> JSONObject userObj = (JSONObject) obj; > >>>>> > >>>>> SolrInputDocument childUserEvent = new SolrInputDocument(); > >>>>> childUserEvent.addField("id", generateID()); > >>>>> childUserEvent.addField("User", userObj.get("User")); > >>>>> > >>>>> obj = parser.parse(jsonOBJ.get("EventDescription").toString()); > >>>>> JSONObject eventdescriptionObj = (JSONObject) obj; > >>>>> > >>>>> SolrInputDocument childEventDescEvent = new SolrInputDocument(); > >>>>> childEventDescEvent.addField("id", generateID()); > >>>>> childEventDescEvent.addField("EventApplicationName", > >>>>> eventdescriptionObj.get("EventApplicationName")); > >>>>> childEventDescEvent.addField("Query", > >>> eventdescriptionObj.get("Query")); > >>>>> > >>>>> obj= > >>> JSONValue.parse(eventdescriptionObj.get("Information").toString()); > >>>>> JSONArray informationArray = (JSONArray) obj; > >>>>> > >>>>> for(int i = 0; i<informationArray.size(); i++){ > >>>>> JSONObject domain = (JSONObject) informationArray.get(i); > >>>>> > >>>>> SolrInputDocument domainDoc = new SolrInputDocument(); > >>>>> domainDoc.addField("id", generateID()); > >>>>> domainDoc.addField("domainName", domain.get("domainName")); > >>>>> > >>>>> String s = domain.get("columns").toString(); > >>>>> obj= JSONValue.parse(s); > >>>>> JSONArray ColumnsArray = (JSONArray) obj; > >>>>> > >>>>> SolrInputDocument columnsDoc = new SolrInputDocument(); > >>>>> columnsDoc.addField("id", generateID()); > >>>>> > >>>>> for(int j = 0; j<ColumnsArray.size(); j++){ > >>>>> JSONObject ColumnsObj = (JSONObject) ColumnsArray.get(j); > >>>>> SolrInputDocument columnDoc = new SolrInputDocument(); > >>>>> columnDoc.addField("id", generateID()); > >>>>> columnDoc.addField("movieName", > >>> ColumnsObj.get("movieName")); > >>>>> columnsDoc.addChildDocument(columnDoc); > >>>>> } > >>>>> domainDoc.addChildDocument(columnsDoc); > >>>>> childEventDescEvent.addChildDocument(domainDoc); > >>>>> } > >>>>> > >>>>> mainEvent.addChildDocument(childEventDescEvent); > >>>>> mainEvent.addChildDocument(childUserEvent); > >>>>> return mainEvent; > >>>>> } > >>>>> > >>>>> I would be grateful if you could let me know what I am missing. > >>>>> > >>>>> On Sun, Jul 19, 2015 at 2:16 PM, Erick Erickson < > >>> erickerick...@gmail.com> > >>>>> wrote: > >>>>> > >>>>>> First thing is it looks like you're only sending one document at a > >>>>>> time, perhaps with child objects. This is not optimal at all. I > >>>>>> usually batch my docs up in groups of 1,000, and there is anecdotal > >>>>>> evidence that there may (depending on the docs) be some gains above > >>>>>> that number. Gotta balance the batch size off against how bug the > docs > >>>>>> are of course. > >>>>>> > >>>>>> Assuming that you really are calling this method for one doc (and > >>>>>> children) at a time, the far bigger problem other than calling > >>>>>> server.add for each parent/children is that you're then calling > >>>>>> solr.commit() every time. This is an anti-pattern. Generally, let > the > >>>>>> autoCommit setting in solrconfig.xml handle the intermediate commits > >>>>>> while the indexing program is running and only issue a commit at the > >>>>>> very end of the job if at all. > >>>>>> > >>>>>> Best, > >>>>>> Erick > >>>>>> > >>>>>> On Sun, Jul 19, 2015 at 12:08 PM, Vineeth Dasaraju > >>>>>> <vineeth.ii...@gmail.com> wrote: > >>>>>>> Hi, > >>>>>>> > >>>>>>> I am trying to index JSON objects (which contain nested JSON > objects > >>> and > >>>>>>> Arrays in them) into solr. > >>>>>>> > >>>>>>> My JSON Object looks like the following (This is fake data that I > am > >>>>>> using > >>>>>>> for this example): > >>>>>>> > >>>>>>> { > >>>>>>> "RawEventMessage": "Lorem ipsum dolor sit amet, consectetur > >>>>>> adipiscing > >>>>>>> elit. Aliquam dolor orci, placerat ac pretium a, tincidunt > >>> consectetur > >>>>>>> mauris. Etiam sollicitudin sapien id odio tempus, non sodales odio > >>>>>> iaculis. > >>>>>>> Donec fringilla diam at placerat interdum. Proin vitae arcu non > augue > >>>>>>> facilisis auctor id non neque. Integer non nibh sit amet justo > >>> facilisis > >>>>>>> semper a vel ligula. Pellentesque commodo vulputate consequat. ", > >>>>>>> "EventUid": "1279706565", > >>>>>>> "TimeOfEvent": "2015-05-01-08-07-13", > >>>>>>> "TimeOfEventUTC": "2015-05-01-01-07-13", > >>>>>>> "EventCollector": "kafka", > >>>>>>> "EventMessageType": "kafka-@column", > >>>>>>> "User": { > >>>>>>> "User": "Lorem ipsum", > >>>>>>> "UserGroup": "Manager", > >>>>>>> "Location": "consectetur adipiscing", > >>>>>>> "Department": "Legal" > >>>>>>> }, > >>>>>>> "EventDescription": { > >>>>>>> "EventApplicationName": "", > >>>>>>> "Query": "SELECT * FROM MOVIES", > >>>>>>> "Information": [ > >>>>>>> { > >>>>>>> "domainName": "English", > >>>>>>> "columns": [ > >>>>>>> { > >>>>>>> "movieName": "Casablanca", > >>>>>>> "duration": "154", > >>>>>>> }, > >>>>>>> { > >>>>>>> "movieName": "Die Hard", > >>>>>>> "duration": "127", > >>>>>>> } > >>>>>>> ] > >>>>>>> }, > >>>>>>> { > >>>>>>> "domainName": "Hindi", > >>>>>>> "columns": [ > >>>>>>> { > >>>>>>> "movieName": "DDLJ", > >>>>>>> "duration": "176", > >>>>>>> } > >>>>>>> ] > >>>>>>> } > >>>>>>> ] > >>>>>>> } > >>>>>>> } > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> My function for indexing the object is as follows: > >>>>>>> > >>>>>>> public static void indexJSON(JSONObject jsonOBJ) throws > >>> ParseException, > >>>>>>> IOException, SolrServerException { > >>>>>>> Collection<SolrInputDocument> batch = new > >>>>>>> ArrayList<SolrInputDocument>(); > >>>>>>> > >>>>>>> SolrInputDocument mainEvent = new SolrInputDocument(); > >>>>>>> mainEvent.addField("id", generateID()); > >>>>>>> mainEvent.addField("RawEventMessage", > >>>>>> jsonOBJ.get("RawEventMessage")); > >>>>>>> mainEvent.addField("EventUid", jsonOBJ.get("EventUid")); > >>>>>>> mainEvent.addField("EventCollector", > >>> jsonOBJ.get("EventCollector")); > >>>>>>> mainEvent.addField("EventMessageType", > >>>>>> jsonOBJ.get("EventMessageType")); > >>>>>>> mainEvent.addField("TimeOfEvent", jsonOBJ.get("TimeOfEvent")); > >>>>>>> mainEvent.addField("TimeOfEventUTC", > >>> jsonOBJ.get("TimeOfEventUTC")); > >>>>>>> > >>>>>>> Object obj = parser.parse(jsonOBJ.get("User").toString()); > >>>>>>> JSONObject userObj = (JSONObject) obj; > >>>>>>> > >>>>>>> SolrInputDocument childUserEvent = new SolrInputDocument(); > >>>>>>> childUserEvent.addField("id", generateID()); > >>>>>>> childUserEvent.addField("User", userObj.get("User")); > >>>>>>> > >>>>>>> obj = parser.parse(jsonOBJ.get("EventDescription").toString()); > >>>>>>> JSONObject eventdescriptionObj = (JSONObject) obj; > >>>>>>> > >>>>>>> SolrInputDocument childEventDescEvent = new SolrInputDocument(); > >>>>>>> childEventDescEvent.addField("id", generateID()); > >>>>>>> childEventDescEvent.addField("EventApplicationName", > >>>>>>> eventdescriptionObj.get("EventApplicationName")); > >>>>>>> childEventDescEvent.addField("Query", > >>>>>> eventdescriptionObj.get("Query")); > >>>>>>> > >>>>>>> obj= > >>>>>> JSONValue.parse(eventdescriptionObj.get("Information").toString()); > >>>>>>> JSONArray informationArray = (JSONArray) obj; > >>>>>>> > >>>>>>> for(int i = 0; i<informationArray.size(); i++){ > >>>>>>> JSONObject domain = (JSONObject) informationArray.get(i); > >>>>>>> > >>>>>>> SolrInputDocument domainDoc = new SolrInputDocument(); > >>>>>>> domainDoc.addField("id", generateID()); > >>>>>>> domainDoc.addField("domainName", domain.get("domainName")); > >>>>>>> > >>>>>>> String s = domain.get("columns").toString(); > >>>>>>> obj= JSONValue.parse(s); > >>>>>>> JSONArray ColumnsArray = (JSONArray) obj; > >>>>>>> > >>>>>>> SolrInputDocument columnsDoc = new SolrInputDocument(); > >>>>>>> columnsDoc.addField("id", generateID()); > >>>>>>> > >>>>>>> for(int j = 0; j<ColumnsArray.size(); j++){ > >>>>>>> JSONObject ColumnsObj = (JSONObject) ColumnsArray.get(j); > >>>>>>> SolrInputDocument columnDoc = new SolrInputDocument(); > >>>>>>> columnDoc.addField("id", generateID()); > >>>>>>> columnDoc.addField("movieName", > >>> ColumnsObj.get("movieName")); > >>>>>>> columnsDoc.addChildDocument(columnDoc); > >>>>>>> } > >>>>>>> domainDoc.addChildDocument(columnsDoc); > >>>>>>> childEventDescEvent.addChildDocument(domainDoc); > >>>>>>> } > >>>>>>> > >>>>>>> mainEvent.addChildDocument(childEventDescEvent); > >>>>>>> mainEvent.addChildDocument(childUserEvent); > >>>>>>> batch.add(mainEvent); > >>>>>>> solr.add(batch); > >>>>>>> solr.commit(); > >>>>>>> } > >>>>>>> > >>>>>>> When I try to index the using the above code, I am able to index > >>> only 12 > >>>>>>> Objects per second. Is there a faster way to do the indexing? I > >>> believe I > >>>>>>> am using the json-fast parser which is one of the fastest parsers > for > >>>>>> json. > >>>>>>> > >>>>>>> Your help will be very valuable to me. > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Vineeth > >>> >