Hi Upayavira, I guess that is the problem. I am currently using a function for generating an ID. It takes the current date and time to milliseconds and generates the id. This is the function.
public static String generateID(){ Date dNow = new Date(); SimpleDateFormat ft = new SimpleDateFormat("yyMMddhhmmssMs"); String datetime = ft.format(dNow); return datetime; } I believe that despite having a millisecond precision in the id generation, multiple objects are being assigned the same ID. Can you suggest a better way to generate the ID? Regards, Vineeth On Tue, Jul 21, 2015 at 1:29 PM, Upayavira <u...@odoko.co.uk> wrote: > Are you making sure that every document has a unique ID? Index into an > empty Solr, then look at your maxdocs vs numdocs. If they are different > (maxdocs is higher) then some of your documents have been deleted, > meaning some were overwritten. > > That might be a place to look. > > Upayavira > > On Tue, Jul 21, 2015, at 09:24 PM, solr.user.1...@gmail.com wrote: > > I can confirm this behavior, seen when sending json docs in batch, never > > happens when sending one by one, but sporadic when sending batches. > > > > Like if sole/jetty drops couple of documents out of the batch. > > > > Regards > > > > > On 21 Jul 2015, at 21:38, Vineeth Dasaraju <vineeth.ii...@gmail.com> > wrote: > > > > > > Hi, > > > > > > Thank You Erick for your inputs. I tried creating batches of 1000 > objects > > > and indexing it to solr. The performance is way better than before but > I > > > find that number of indexed documents that is shown in the dashboard is > > > lesser than the number of documents that I had actually indexed through > > > solrj. My code is as follows: > > > > > > private static String SOLR_SERVER_URL = " > http://localhost:8983/solr/newcore > > > "; > > > private static String JSON_FILE_PATH = > "/home/vineeth/week1_fixed.json"; > > > private static JSONParser parser = new JSONParser(); > > > private static SolrClient solr = new HttpSolrClient(SOLR_SERVER_URL); > > > > > > public static void main(String[] args) throws IOException, > > > SolrServerException, ParseException { > > > File file = new File(JSON_FILE_PATH); > > > Scanner scn=new Scanner(file,"UTF-8"); > > > JSONObject object; > > > int i = 0; > > > Collection<SolrInputDocument> batch = new > > > ArrayList<SolrInputDocument>(); > > > while(scn.hasNext()){ > > > object= (JSONObject) parser.parse(scn.nextLine()); > > > SolrInputDocument doc = indexJSON(object); > > > batch.add(doc); > > > if(i%1000==0){ > > > System.out.println("Indexed " + (i+1) + " objects." ); > > > solr.add(batch); > > > batch = new ArrayList<SolrInputDocument>(); > > > } > > > i++; > > > } > > > solr.add(batch); > > > solr.commit(); > > > System.out.println("Indexed " + (i+1) + " objects." ); > > > } > > > > > > public static SolrInputDocument indexJSON(JSONObject jsonOBJ) throws > > > ParseException, IOException, SolrServerException { > > > Collection<SolrInputDocument> batch = new > > > ArrayList<SolrInputDocument>(); > > > > > > SolrInputDocument mainEvent = new SolrInputDocument(); > > > mainEvent.addField("id", generateID()); > > > mainEvent.addField("RawEventMessage", > jsonOBJ.get("RawEventMessage")); > > > mainEvent.addField("EventUid", jsonOBJ.get("EventUid")); > > > mainEvent.addField("EventCollector", jsonOBJ.get("EventCollector")); > > > mainEvent.addField("EventMessageType", > jsonOBJ.get("EventMessageType")); > > > mainEvent.addField("TimeOfEvent", jsonOBJ.get("TimeOfEvent")); > > > mainEvent.addField("TimeOfEventUTC", jsonOBJ.get("TimeOfEventUTC")); > > > > > > Object obj = parser.parse(jsonOBJ.get("User").toString()); > > > JSONObject userObj = (JSONObject) obj; > > > > > > SolrInputDocument childUserEvent = new SolrInputDocument(); > > > childUserEvent.addField("id", generateID()); > > > childUserEvent.addField("User", userObj.get("User")); > > > > > > obj = parser.parse(jsonOBJ.get("EventDescription").toString()); > > > JSONObject eventdescriptionObj = (JSONObject) obj; > > > > > > SolrInputDocument childEventDescEvent = new SolrInputDocument(); > > > childEventDescEvent.addField("id", generateID()); > > > childEventDescEvent.addField("EventApplicationName", > > > eventdescriptionObj.get("EventApplicationName")); > > > childEventDescEvent.addField("Query", > eventdescriptionObj.get("Query")); > > > > > > obj= > JSONValue.parse(eventdescriptionObj.get("Information").toString()); > > > JSONArray informationArray = (JSONArray) obj; > > > > > > for(int i = 0; i<informationArray.size(); i++){ > > > JSONObject domain = (JSONObject) informationArray.get(i); > > > > > > SolrInputDocument domainDoc = new SolrInputDocument(); > > > domainDoc.addField("id", generateID()); > > > domainDoc.addField("domainName", domain.get("domainName")); > > > > > > String s = domain.get("columns").toString(); > > > obj= JSONValue.parse(s); > > > JSONArray ColumnsArray = (JSONArray) obj; > > > > > > SolrInputDocument columnsDoc = new SolrInputDocument(); > > > columnsDoc.addField("id", generateID()); > > > > > > for(int j = 0; j<ColumnsArray.size(); j++){ > > > JSONObject ColumnsObj = (JSONObject) ColumnsArray.get(j); > > > SolrInputDocument columnDoc = new SolrInputDocument(); > > > columnDoc.addField("id", generateID()); > > > columnDoc.addField("movieName", > ColumnsObj.get("movieName")); > > > columnsDoc.addChildDocument(columnDoc); > > > } > > > domainDoc.addChildDocument(columnsDoc); > > > childEventDescEvent.addChildDocument(domainDoc); > > > } > > > > > > mainEvent.addChildDocument(childEventDescEvent); > > > mainEvent.addChildDocument(childUserEvent); > > > return mainEvent; > > > } > > > > > > I would be grateful if you could let me know what I am missing. > > > > > > On Sun, Jul 19, 2015 at 2:16 PM, Erick Erickson < > erickerick...@gmail.com> > > > wrote: > > > > > >> First thing is it looks like you're only sending one document at a > > >> time, perhaps with child objects. This is not optimal at all. I > > >> usually batch my docs up in groups of 1,000, and there is anecdotal > > >> evidence that there may (depending on the docs) be some gains above > > >> that number. Gotta balance the batch size off against how bug the docs > > >> are of course. > > >> > > >> Assuming that you really are calling this method for one doc (and > > >> children) at a time, the far bigger problem other than calling > > >> server.add for each parent/children is that you're then calling > > >> solr.commit() every time. This is an anti-pattern. Generally, let the > > >> autoCommit setting in solrconfig.xml handle the intermediate commits > > >> while the indexing program is running and only issue a commit at the > > >> very end of the job if at all. > > >> > > >> Best, > > >> Erick > > >> > > >> On Sun, Jul 19, 2015 at 12:08 PM, Vineeth Dasaraju > > >> <vineeth.ii...@gmail.com> wrote: > > >>> Hi, > > >>> > > >>> I am trying to index JSON objects (which contain nested JSON objects > and > > >>> Arrays in them) into solr. > > >>> > > >>> My JSON Object looks like the following (This is fake data that I am > > >> using > > >>> for this example): > > >>> > > >>> { > > >>> "RawEventMessage": "Lorem ipsum dolor sit amet, consectetur > > >> adipiscing > > >>> elit. Aliquam dolor orci, placerat ac pretium a, tincidunt > consectetur > > >>> mauris. Etiam sollicitudin sapien id odio tempus, non sodales odio > > >> iaculis. > > >>> Donec fringilla diam at placerat interdum. Proin vitae arcu non augue > > >>> facilisis auctor id non neque. Integer non nibh sit amet justo > facilisis > > >>> semper a vel ligula. Pellentesque commodo vulputate consequat. ", > > >>> "EventUid": "1279706565", > > >>> "TimeOfEvent": "2015-05-01-08-07-13", > > >>> "TimeOfEventUTC": "2015-05-01-01-07-13", > > >>> "EventCollector": "kafka", > > >>> "EventMessageType": "kafka-@column", > > >>> "User": { > > >>> "User": "Lorem ipsum", > > >>> "UserGroup": "Manager", > > >>> "Location": "consectetur adipiscing", > > >>> "Department": "Legal" > > >>> }, > > >>> "EventDescription": { > > >>> "EventApplicationName": "", > > >>> "Query": "SELECT * FROM MOVIES", > > >>> "Information": [ > > >>> { > > >>> "domainName": "English", > > >>> "columns": [ > > >>> { > > >>> "movieName": "Casablanca", > > >>> "duration": "154", > > >>> }, > > >>> { > > >>> "movieName": "Die Hard", > > >>> "duration": "127", > > >>> } > > >>> ] > > >>> }, > > >>> { > > >>> "domainName": "Hindi", > > >>> "columns": [ > > >>> { > > >>> "movieName": "DDLJ", > > >>> "duration": "176", > > >>> } > > >>> ] > > >>> } > > >>> ] > > >>> } > > >>> } > > >>> > > >>> > > >>> > > >>> My function for indexing the object is as follows: > > >>> > > >>> public static void indexJSON(JSONObject jsonOBJ) throws > ParseException, > > >>> IOException, SolrServerException { > > >>> Collection<SolrInputDocument> batch = new > > >>> ArrayList<SolrInputDocument>(); > > >>> > > >>> SolrInputDocument mainEvent = new SolrInputDocument(); > > >>> mainEvent.addField("id", generateID()); > > >>> mainEvent.addField("RawEventMessage", > > >> jsonOBJ.get("RawEventMessage")); > > >>> mainEvent.addField("EventUid", jsonOBJ.get("EventUid")); > > >>> mainEvent.addField("EventCollector", > jsonOBJ.get("EventCollector")); > > >>> mainEvent.addField("EventMessageType", > > >> jsonOBJ.get("EventMessageType")); > > >>> mainEvent.addField("TimeOfEvent", jsonOBJ.get("TimeOfEvent")); > > >>> mainEvent.addField("TimeOfEventUTC", > jsonOBJ.get("TimeOfEventUTC")); > > >>> > > >>> Object obj = parser.parse(jsonOBJ.get("User").toString()); > > >>> JSONObject userObj = (JSONObject) obj; > > >>> > > >>> SolrInputDocument childUserEvent = new SolrInputDocument(); > > >>> childUserEvent.addField("id", generateID()); > > >>> childUserEvent.addField("User", userObj.get("User")); > > >>> > > >>> obj = parser.parse(jsonOBJ.get("EventDescription").toString()); > > >>> JSONObject eventdescriptionObj = (JSONObject) obj; > > >>> > > >>> SolrInputDocument childEventDescEvent = new SolrInputDocument(); > > >>> childEventDescEvent.addField("id", generateID()); > > >>> childEventDescEvent.addField("EventApplicationName", > > >>> eventdescriptionObj.get("EventApplicationName")); > > >>> childEventDescEvent.addField("Query", > > >> eventdescriptionObj.get("Query")); > > >>> > > >>> obj= > > >> JSONValue.parse(eventdescriptionObj.get("Information").toString()); > > >>> JSONArray informationArray = (JSONArray) obj; > > >>> > > >>> for(int i = 0; i<informationArray.size(); i++){ > > >>> JSONObject domain = (JSONObject) informationArray.get(i); > > >>> > > >>> SolrInputDocument domainDoc = new SolrInputDocument(); > > >>> domainDoc.addField("id", generateID()); > > >>> domainDoc.addField("domainName", domain.get("domainName")); > > >>> > > >>> String s = domain.get("columns").toString(); > > >>> obj= JSONValue.parse(s); > > >>> JSONArray ColumnsArray = (JSONArray) obj; > > >>> > > >>> SolrInputDocument columnsDoc = new SolrInputDocument(); > > >>> columnsDoc.addField("id", generateID()); > > >>> > > >>> for(int j = 0; j<ColumnsArray.size(); j++){ > > >>> JSONObject ColumnsObj = (JSONObject) ColumnsArray.get(j); > > >>> SolrInputDocument columnDoc = new SolrInputDocument(); > > >>> columnDoc.addField("id", generateID()); > > >>> columnDoc.addField("movieName", > ColumnsObj.get("movieName")); > > >>> columnsDoc.addChildDocument(columnDoc); > > >>> } > > >>> domainDoc.addChildDocument(columnsDoc); > > >>> childEventDescEvent.addChildDocument(domainDoc); > > >>> } > > >>> > > >>> mainEvent.addChildDocument(childEventDescEvent); > > >>> mainEvent.addChildDocument(childUserEvent); > > >>> batch.add(mainEvent); > > >>> solr.add(batch); > > >>> solr.commit(); > > >>> } > > >>> > > >>> When I try to index the using the above code, I am able to index > only 12 > > >>> Objects per second. Is there a faster way to do the indexing? I > believe I > > >>> am using the json-fast parser which is one of the fastest parsers for > > >> json. > > >>> > > >>> Your help will be very valuable to me. > > >>> > > >>> Thanks, > > >>> Vineeth > > >> >