In Java: UUID.randomUUID(); That is what I'm using.
Regards > On 21 Jul 2015, at 22:38, Vineeth Dasaraju <vineeth.ii...@gmail.com> wrote: > > Hi Upayavira, > > I guess that is the problem. I am currently using a function for generating > an ID. It takes the current date and time to milliseconds and generates the > id. This is the function. > > public static String generateID(){ > Date dNow = new Date(); > SimpleDateFormat ft = new SimpleDateFormat("yyMMddhhmmssMs"); > String datetime = ft.format(dNow); > return datetime; > } > > > I believe that despite having a millisecond precision in the id generation, > multiple objects are being assigned the same ID. Can you suggest a better > way to generate the ID? > > Regards, > Vineeth > > >> On Tue, Jul 21, 2015 at 1:29 PM, Upayavira <u...@odoko.co.uk> wrote: >> >> Are you making sure that every document has a unique ID? Index into an >> empty Solr, then look at your maxdocs vs numdocs. If they are different >> (maxdocs is higher) then some of your documents have been deleted, >> meaning some were overwritten. >> >> That might be a place to look. >> >> Upayavira >> >>> On Tue, Jul 21, 2015, at 09:24 PM, solr.user.1...@gmail.com wrote: >>> I can confirm this behavior, seen when sending json docs in batch, never >>> happens when sending one by one, but sporadic when sending batches. >>> >>> Like if sole/jetty drops couple of documents out of the batch. >>> >>> Regards >>> >>>> On 21 Jul 2015, at 21:38, Vineeth Dasaraju <vineeth.ii...@gmail.com> >> wrote: >>>> >>>> Hi, >>>> >>>> Thank You Erick for your inputs. I tried creating batches of 1000 >> objects >>>> and indexing it to solr. The performance is way better than before but >> I >>>> find that number of indexed documents that is shown in the dashboard is >>>> lesser than the number of documents that I had actually indexed through >>>> solrj. My code is as follows: >>>> >>>> private static String SOLR_SERVER_URL = " >> http://localhost:8983/solr/newcore >>>> "; >>>> private static String JSON_FILE_PATH = >> "/home/vineeth/week1_fixed.json"; >>>> private static JSONParser parser = new JSONParser(); >>>> private static SolrClient solr = new HttpSolrClient(SOLR_SERVER_URL); >>>> >>>> public static void main(String[] args) throws IOException, >>>> SolrServerException, ParseException { >>>> File file = new File(JSON_FILE_PATH); >>>> Scanner scn=new Scanner(file,"UTF-8"); >>>> JSONObject object; >>>> int i = 0; >>>> Collection<SolrInputDocument> batch = new >>>> ArrayList<SolrInputDocument>(); >>>> while(scn.hasNext()){ >>>> object= (JSONObject) parser.parse(scn.nextLine()); >>>> SolrInputDocument doc = indexJSON(object); >>>> batch.add(doc); >>>> if(i%1000==0){ >>>> System.out.println("Indexed " + (i+1) + " objects." ); >>>> solr.add(batch); >>>> batch = new ArrayList<SolrInputDocument>(); >>>> } >>>> i++; >>>> } >>>> solr.add(batch); >>>> solr.commit(); >>>> System.out.println("Indexed " + (i+1) + " objects." ); >>>> } >>>> >>>> public static SolrInputDocument indexJSON(JSONObject jsonOBJ) throws >>>> ParseException, IOException, SolrServerException { >>>> Collection<SolrInputDocument> batch = new >>>> ArrayList<SolrInputDocument>(); >>>> >>>> SolrInputDocument mainEvent = new SolrInputDocument(); >>>> mainEvent.addField("id", generateID()); >>>> mainEvent.addField("RawEventMessage", >> jsonOBJ.get("RawEventMessage")); >>>> mainEvent.addField("EventUid", jsonOBJ.get("EventUid")); >>>> mainEvent.addField("EventCollector", jsonOBJ.get("EventCollector")); >>>> mainEvent.addField("EventMessageType", >> jsonOBJ.get("EventMessageType")); >>>> mainEvent.addField("TimeOfEvent", jsonOBJ.get("TimeOfEvent")); >>>> mainEvent.addField("TimeOfEventUTC", jsonOBJ.get("TimeOfEventUTC")); >>>> >>>> Object obj = parser.parse(jsonOBJ.get("User").toString()); >>>> JSONObject userObj = (JSONObject) obj; >>>> >>>> SolrInputDocument childUserEvent = new SolrInputDocument(); >>>> childUserEvent.addField("id", generateID()); >>>> childUserEvent.addField("User", userObj.get("User")); >>>> >>>> obj = parser.parse(jsonOBJ.get("EventDescription").toString()); >>>> JSONObject eventdescriptionObj = (JSONObject) obj; >>>> >>>> SolrInputDocument childEventDescEvent = new SolrInputDocument(); >>>> childEventDescEvent.addField("id", generateID()); >>>> childEventDescEvent.addField("EventApplicationName", >>>> eventdescriptionObj.get("EventApplicationName")); >>>> childEventDescEvent.addField("Query", >> eventdescriptionObj.get("Query")); >>>> >>>> obj= >> JSONValue.parse(eventdescriptionObj.get("Information").toString()); >>>> JSONArray informationArray = (JSONArray) obj; >>>> >>>> for(int i = 0; i<informationArray.size(); i++){ >>>> JSONObject domain = (JSONObject) informationArray.get(i); >>>> >>>> SolrInputDocument domainDoc = new SolrInputDocument(); >>>> domainDoc.addField("id", generateID()); >>>> domainDoc.addField("domainName", domain.get("domainName")); >>>> >>>> String s = domain.get("columns").toString(); >>>> obj= JSONValue.parse(s); >>>> JSONArray ColumnsArray = (JSONArray) obj; >>>> >>>> SolrInputDocument columnsDoc = new SolrInputDocument(); >>>> columnsDoc.addField("id", generateID()); >>>> >>>> for(int j = 0; j<ColumnsArray.size(); j++){ >>>> JSONObject ColumnsObj = (JSONObject) ColumnsArray.get(j); >>>> SolrInputDocument columnDoc = new SolrInputDocument(); >>>> columnDoc.addField("id", generateID()); >>>> columnDoc.addField("movieName", >> ColumnsObj.get("movieName")); >>>> columnsDoc.addChildDocument(columnDoc); >>>> } >>>> domainDoc.addChildDocument(columnsDoc); >>>> childEventDescEvent.addChildDocument(domainDoc); >>>> } >>>> >>>> mainEvent.addChildDocument(childEventDescEvent); >>>> mainEvent.addChildDocument(childUserEvent); >>>> return mainEvent; >>>> } >>>> >>>> I would be grateful if you could let me know what I am missing. >>>> >>>> On Sun, Jul 19, 2015 at 2:16 PM, Erick Erickson < >> erickerick...@gmail.com> >>>> wrote: >>>> >>>>> First thing is it looks like you're only sending one document at a >>>>> time, perhaps with child objects. This is not optimal at all. I >>>>> usually batch my docs up in groups of 1,000, and there is anecdotal >>>>> evidence that there may (depending on the docs) be some gains above >>>>> that number. Gotta balance the batch size off against how bug the docs >>>>> are of course. >>>>> >>>>> Assuming that you really are calling this method for one doc (and >>>>> children) at a time, the far bigger problem other than calling >>>>> server.add for each parent/children is that you're then calling >>>>> solr.commit() every time. This is an anti-pattern. Generally, let the >>>>> autoCommit setting in solrconfig.xml handle the intermediate commits >>>>> while the indexing program is running and only issue a commit at the >>>>> very end of the job if at all. >>>>> >>>>> Best, >>>>> Erick >>>>> >>>>> On Sun, Jul 19, 2015 at 12:08 PM, Vineeth Dasaraju >>>>> <vineeth.ii...@gmail.com> wrote: >>>>>> Hi, >>>>>> >>>>>> I am trying to index JSON objects (which contain nested JSON objects >> and >>>>>> Arrays in them) into solr. >>>>>> >>>>>> My JSON Object looks like the following (This is fake data that I am >>>>> using >>>>>> for this example): >>>>>> >>>>>> { >>>>>> "RawEventMessage": "Lorem ipsum dolor sit amet, consectetur >>>>> adipiscing >>>>>> elit. Aliquam dolor orci, placerat ac pretium a, tincidunt >> consectetur >>>>>> mauris. Etiam sollicitudin sapien id odio tempus, non sodales odio >>>>> iaculis. >>>>>> Donec fringilla diam at placerat interdum. Proin vitae arcu non augue >>>>>> facilisis auctor id non neque. Integer non nibh sit amet justo >> facilisis >>>>>> semper a vel ligula. Pellentesque commodo vulputate consequat. ", >>>>>> "EventUid": "1279706565", >>>>>> "TimeOfEvent": "2015-05-01-08-07-13", >>>>>> "TimeOfEventUTC": "2015-05-01-01-07-13", >>>>>> "EventCollector": "kafka", >>>>>> "EventMessageType": "kafka-@column", >>>>>> "User": { >>>>>> "User": "Lorem ipsum", >>>>>> "UserGroup": "Manager", >>>>>> "Location": "consectetur adipiscing", >>>>>> "Department": "Legal" >>>>>> }, >>>>>> "EventDescription": { >>>>>> "EventApplicationName": "", >>>>>> "Query": "SELECT * FROM MOVIES", >>>>>> "Information": [ >>>>>> { >>>>>> "domainName": "English", >>>>>> "columns": [ >>>>>> { >>>>>> "movieName": "Casablanca", >>>>>> "duration": "154", >>>>>> }, >>>>>> { >>>>>> "movieName": "Die Hard", >>>>>> "duration": "127", >>>>>> } >>>>>> ] >>>>>> }, >>>>>> { >>>>>> "domainName": "Hindi", >>>>>> "columns": [ >>>>>> { >>>>>> "movieName": "DDLJ", >>>>>> "duration": "176", >>>>>> } >>>>>> ] >>>>>> } >>>>>> ] >>>>>> } >>>>>> } >>>>>> >>>>>> >>>>>> >>>>>> My function for indexing the object is as follows: >>>>>> >>>>>> public static void indexJSON(JSONObject jsonOBJ) throws >> ParseException, >>>>>> IOException, SolrServerException { >>>>>> Collection<SolrInputDocument> batch = new >>>>>> ArrayList<SolrInputDocument>(); >>>>>> >>>>>> SolrInputDocument mainEvent = new SolrInputDocument(); >>>>>> mainEvent.addField("id", generateID()); >>>>>> mainEvent.addField("RawEventMessage", >>>>> jsonOBJ.get("RawEventMessage")); >>>>>> mainEvent.addField("EventUid", jsonOBJ.get("EventUid")); >>>>>> mainEvent.addField("EventCollector", >> jsonOBJ.get("EventCollector")); >>>>>> mainEvent.addField("EventMessageType", >>>>> jsonOBJ.get("EventMessageType")); >>>>>> mainEvent.addField("TimeOfEvent", jsonOBJ.get("TimeOfEvent")); >>>>>> mainEvent.addField("TimeOfEventUTC", >> jsonOBJ.get("TimeOfEventUTC")); >>>>>> >>>>>> Object obj = parser.parse(jsonOBJ.get("User").toString()); >>>>>> JSONObject userObj = (JSONObject) obj; >>>>>> >>>>>> SolrInputDocument childUserEvent = new SolrInputDocument(); >>>>>> childUserEvent.addField("id", generateID()); >>>>>> childUserEvent.addField("User", userObj.get("User")); >>>>>> >>>>>> obj = parser.parse(jsonOBJ.get("EventDescription").toString()); >>>>>> JSONObject eventdescriptionObj = (JSONObject) obj; >>>>>> >>>>>> SolrInputDocument childEventDescEvent = new SolrInputDocument(); >>>>>> childEventDescEvent.addField("id", generateID()); >>>>>> childEventDescEvent.addField("EventApplicationName", >>>>>> eventdescriptionObj.get("EventApplicationName")); >>>>>> childEventDescEvent.addField("Query", >>>>> eventdescriptionObj.get("Query")); >>>>>> >>>>>> obj= >>>>> JSONValue.parse(eventdescriptionObj.get("Information").toString()); >>>>>> JSONArray informationArray = (JSONArray) obj; >>>>>> >>>>>> for(int i = 0; i<informationArray.size(); i++){ >>>>>> JSONObject domain = (JSONObject) informationArray.get(i); >>>>>> >>>>>> SolrInputDocument domainDoc = new SolrInputDocument(); >>>>>> domainDoc.addField("id", generateID()); >>>>>> domainDoc.addField("domainName", domain.get("domainName")); >>>>>> >>>>>> String s = domain.get("columns").toString(); >>>>>> obj= JSONValue.parse(s); >>>>>> JSONArray ColumnsArray = (JSONArray) obj; >>>>>> >>>>>> SolrInputDocument columnsDoc = new SolrInputDocument(); >>>>>> columnsDoc.addField("id", generateID()); >>>>>> >>>>>> for(int j = 0; j<ColumnsArray.size(); j++){ >>>>>> JSONObject ColumnsObj = (JSONObject) ColumnsArray.get(j); >>>>>> SolrInputDocument columnDoc = new SolrInputDocument(); >>>>>> columnDoc.addField("id", generateID()); >>>>>> columnDoc.addField("movieName", >> ColumnsObj.get("movieName")); >>>>>> columnsDoc.addChildDocument(columnDoc); >>>>>> } >>>>>> domainDoc.addChildDocument(columnsDoc); >>>>>> childEventDescEvent.addChildDocument(domainDoc); >>>>>> } >>>>>> >>>>>> mainEvent.addChildDocument(childEventDescEvent); >>>>>> mainEvent.addChildDocument(childUserEvent); >>>>>> batch.add(mainEvent); >>>>>> solr.add(batch); >>>>>> solr.commit(); >>>>>> } >>>>>> >>>>>> When I try to index the using the above code, I am able to index >> only 12 >>>>>> Objects per second. Is there a faster way to do the indexing? I >> believe I >>>>>> am using the json-fast parser which is one of the fastest parsers for >>>>> json. >>>>>> >>>>>> Your help will be very valuable to me. >>>>>> >>>>>> Thanks, >>>>>> Vineeth >>