Hmm yea, the only difference between mine and your system is the hadoop your using and maybe the jdk. I think it's most likely something to do with the jdk in this respect.
On Mon, Mar 31, 2014 at 10:01 PM, ghufran malik <[email protected]>wrote: > the output your code produced is: > > --3-- > --4-- > ---- > ---- > ---- > --5-- > ---- > ---- > ---- > --6-- > ---- > ---- > ---- > --7-- > > it's because of the space between the \t and closing ] in [\t ]. This will > separate output by a space. Whereas if you just have [\t] it will separate > this out using tab spacing. > > Thanks for clearing that up for me! > > Ghufran > > > On Mon, Mar 31, 2014 at 9:50 PM, ghufran malik <[email protected]>wrote: > >> Hey, >> >> Yes when originally debugging the code I thought to check what \t >> actually split by and created my own test class: >> >> import java.util.regex.Pattern; >> >> class App >> { >> private static final Pattern SEPARATOR = Pattern.compile("[\t ]"); >> public static void main( String[] args ) >> { >> String line = "1 0 2"; >> String[] tokens = SEPARATOR.split(line.toString()); >> >> System.out.println(SEPARATOR); >> System.out.println(tokens.length); >> >> for(String token : tokens){ >> >> System.out.println(token); >> } >> } >> } >> >> and the pattern worked as I thought it should by tab spaces. >> >> I'll try your test as well to double check >> >> >> On Mon, Mar 31, 2014 at 9:34 PM, Young Han <[email protected]>wrote: >> >>> Weird, inputs with tabs work for me right out of the box. Either the >>> "\t" is not the cause or it's some Java-version specific issue. Try this >>> toy program: >>> >>> >>> import java.util.regex.Pattern; >>> >>> public class Test { >>> public static void main(String[] args) { >>> Pattern SEPARATOR = Pattern.compile("[\t ]"); >>> String[] tokens = SEPARATOR.split("3 4 5 6 7"); >>> >>> for (int i = 0; i < tokens.length; i++) { >>> System.out.println("--" + tokens[i] + "--"); >>> } >>> } >>> } >>> >>> >>> Does it split the tabs properly for your Java? >>> >>> Young >>> >>> >>> On Mon, Mar 31, 2014 at 4:19 PM, ghufran malik >>> <[email protected]>wrote: >>> >>>> Yep you right it is a bug with all the InputFormats I believe, I just >>>> checked it with the Giraph 1.1.0 jar using the IntIntNullVertexInputFormat >>>> and the example ConnectedComponents class and it worked like a charm with >>>> just the normal spacing. >>>> >>>> >>>> On Mon, Mar 31, 2014 at 9:15 PM, Young Han <[email protected]>wrote: >>>> >>>>> Huh, it might be a bug in the code. Could it be that Pattern.compile >>>>> has to take "[\\t ]" (note the double backslash) to properly match tabs? >>>>> If >>>>> so, that bug is in all the input formats... >>>>> >>>>> Happy to help :) >>>>> >>>>> Young >>>>> >>>>> >>>>> On Mon, Mar 31, 2014 at 4:07 PM, ghufran malik < >>>>> [email protected]> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I removed the spaces and it worked! I don't understand though. I'm >>>>>> sure the separator pattern means that it splits it by tab spaces?. >>>>>> >>>>>> Thanks for all your help though some what relieved now! >>>>>> >>>>>> Kind regards, >>>>>> >>>>>> Ghufran >>>>>> >>>>>> >>>>>> On Mon, Mar 31, 2014 at 8:15 PM, Young Han <[email protected]>wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> That looks like an error with the algorithm... What do the Hadoop >>>>>>> userlogs say? >>>>>>> >>>>>>> And just to rule out weirdness, what happens if you use spaces >>>>>>> instead of tabs (for your input graph)? >>>>>>> >>>>>>> Young >>>>>>> >>>>>>> >>>>>>> On Mon, Mar 31, 2014 at 2:04 PM, ghufran malik < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Hey, >>>>>>>> >>>>>>>> No even after I added the .txt it gets to map 100% then drops back >>>>>>>> down to 50 and gives me the error: >>>>>>>> >>>>>>>> 14/03/31 18:22:56 INFO utils.ConfigurationUtils: No edge input >>>>>>>> format specified. Ensure your InputFormat does not require one. >>>>>>>> 14/03/31 18:22:56 WARN job.GiraphConfigurationValidator: Output >>>>>>>> format vertex index type is not known >>>>>>>> 14/03/31 18:22:56 WARN job.GiraphConfigurationValidator: Output >>>>>>>> format vertex value type is not known >>>>>>>> 14/03/31 18:22:56 WARN job.GiraphConfigurationValidator: Output >>>>>>>> format edge value type is not known >>>>>>>> 14/03/31 18:22:56 INFO job.GiraphJob: run: Since checkpointing is >>>>>>>> disabled (default), do not allow any task retries (setting >>>>>>>> mapred.map.max.attempts = 0, old value = 4) >>>>>>>> 14/03/31 18:22:57 INFO mapred.JobClient: Running job: >>>>>>>> job_201403311622_0004 >>>>>>>> 14/03/31 18:22:58 INFO mapred.JobClient: map 0% reduce 0% >>>>>>>> 14/03/31 18:23:16 INFO mapred.JobClient: map 50% reduce 0% >>>>>>>> 14/03/31 18:23:19 INFO mapred.JobClient: map 100% reduce 0% >>>>>>>> 14/03/31 18:33:25 INFO mapred.JobClient: map 50% reduce 0% >>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Job complete: >>>>>>>> job_201403311622_0004 >>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Counters: 6 >>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Job Counters >>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: >>>>>>>> SLOTS_MILLIS_MAPS=1238858 >>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Total time spent by >>>>>>>> all reduces waiting after reserving slots (ms)=0 >>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Total time spent by >>>>>>>> all maps waiting after reserving slots (ms)=0 >>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Launched map tasks=2 >>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 >>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Failed map tasks=1 >>>>>>>> >>>>>>>> >>>>>>>> I did a check to make sure the graph was being stored correctly by >>>>>>>> doing: >>>>>>>> >>>>>>>> ghufran@ghufran:~/Downloads/hadoop-0.20.203.0/bin$ hadoop dfs -cat >>>>>>>> input/* >>>>>>>> 1 2 >>>>>>>> 2 1 3 4 >>>>>>>> 3 2 >>>>>>>> 4 2 >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
