That's pretty interesting. Forgot to mention, the output I get is --3-- --4-- --5-- --6-- --7--
So it does look like something is up with Java. Young On Mon, Mar 31, 2014 at 5:05 PM, ghufran malik <[email protected]>wrote: > Hmm yea, the only difference between mine and your system is the hadoop > your using and maybe the jdk. I think it's most likely something to do with > the jdk in this respect. > > > On Mon, Mar 31, 2014 at 10:01 PM, ghufran malik > <[email protected]>wrote: > >> the output your code produced is: >> >> --3-- >> --4-- >> ---- >> ---- >> ---- >> --5-- >> ---- >> ---- >> ---- >> --6-- >> ---- >> ---- >> ---- >> --7-- >> >> it's because of the space between the \t and closing ] in [\t ]. This >> will separate output by a space. Whereas if you just have [\t] it will >> separate this out using tab spacing. >> >> Thanks for clearing that up for me! >> >> Ghufran >> >> >> On Mon, Mar 31, 2014 at 9:50 PM, ghufran malik >> <[email protected]>wrote: >> >>> Hey, >>> >>> Yes when originally debugging the code I thought to check what \t >>> actually split by and created my own test class: >>> >>> import java.util.regex.Pattern; >>> >>> class App >>> { >>> private static final Pattern SEPARATOR = Pattern.compile("[\t ]"); >>> public static void main( String[] args ) >>> { >>> String line = "1 0 2"; >>> String[] tokens = SEPARATOR.split(line.toString()); >>> >>> System.out.println(SEPARATOR); >>> System.out.println(tokens.length); >>> >>> for(String token : tokens){ >>> >>> System.out.println(token); >>> } >>> } >>> } >>> >>> and the pattern worked as I thought it should by tab spaces. >>> >>> I'll try your test as well to double check >>> >>> >>> On Mon, Mar 31, 2014 at 9:34 PM, Young Han <[email protected]>wrote: >>> >>>> Weird, inputs with tabs work for me right out of the box. Either the >>>> "\t" is not the cause or it's some Java-version specific issue. Try this >>>> toy program: >>>> >>>> >>>> import java.util.regex.Pattern; >>>> >>>> public class Test { >>>> public static void main(String[] args) { >>>> Pattern SEPARATOR = Pattern.compile("[\t ]"); >>>> String[] tokens = SEPARATOR.split("3 4 5 6 7"); >>>> >>>> for (int i = 0; i < tokens.length; i++) { >>>> System.out.println("--" + tokens[i] + "--"); >>>> } >>>> } >>>> } >>>> >>>> >>>> Does it split the tabs properly for your Java? >>>> >>>> Young >>>> >>>> >>>> On Mon, Mar 31, 2014 at 4:19 PM, ghufran malik <[email protected] >>>> > wrote: >>>> >>>>> Yep you right it is a bug with all the InputFormats I believe, I just >>>>> checked it with the Giraph 1.1.0 jar using the IntIntNullVertexInputFormat >>>>> and the example ConnectedComponents class and it worked like a charm with >>>>> just the normal spacing. >>>>> >>>>> >>>>> On Mon, Mar 31, 2014 at 9:15 PM, Young Han <[email protected]>wrote: >>>>> >>>>>> Huh, it might be a bug in the code. Could it be that Pattern.compile >>>>>> has to take "[\\t ]" (note the double backslash) to properly match tabs? >>>>>> If >>>>>> so, that bug is in all the input formats... >>>>>> >>>>>> Happy to help :) >>>>>> >>>>>> Young >>>>>> >>>>>> >>>>>> On Mon, Mar 31, 2014 at 4:07 PM, ghufran malik < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I removed the spaces and it worked! I don't understand though. I'm >>>>>>> sure the separator pattern means that it splits it by tab spaces?. >>>>>>> >>>>>>> Thanks for all your help though some what relieved now! >>>>>>> >>>>>>> Kind regards, >>>>>>> >>>>>>> Ghufran >>>>>>> >>>>>>> >>>>>>> On Mon, Mar 31, 2014 at 8:15 PM, Young Han >>>>>>> <[email protected]>wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> That looks like an error with the algorithm... What do the Hadoop >>>>>>>> userlogs say? >>>>>>>> >>>>>>>> And just to rule out weirdness, what happens if you use spaces >>>>>>>> instead of tabs (for your input graph)? >>>>>>>> >>>>>>>> Young >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Mar 31, 2014 at 2:04 PM, ghufran malik < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Hey, >>>>>>>>> >>>>>>>>> No even after I added the .txt it gets to map 100% then drops back >>>>>>>>> down to 50 and gives me the error: >>>>>>>>> >>>>>>>>> 14/03/31 18:22:56 INFO utils.ConfigurationUtils: No edge input >>>>>>>>> format specified. Ensure your InputFormat does not require one. >>>>>>>>> 14/03/31 18:22:56 WARN job.GiraphConfigurationValidator: Output >>>>>>>>> format vertex index type is not known >>>>>>>>> 14/03/31 18:22:56 WARN job.GiraphConfigurationValidator: Output >>>>>>>>> format vertex value type is not known >>>>>>>>> 14/03/31 18:22:56 WARN job.GiraphConfigurationValidator: Output >>>>>>>>> format edge value type is not known >>>>>>>>> 14/03/31 18:22:56 INFO job.GiraphJob: run: Since checkpointing is >>>>>>>>> disabled (default), do not allow any task retries (setting >>>>>>>>> mapred.map.max.attempts = 0, old value = 4) >>>>>>>>> 14/03/31 18:22:57 INFO mapred.JobClient: Running job: >>>>>>>>> job_201403311622_0004 >>>>>>>>> 14/03/31 18:22:58 INFO mapred.JobClient: map 0% reduce 0% >>>>>>>>> 14/03/31 18:23:16 INFO mapred.JobClient: map 50% reduce 0% >>>>>>>>> 14/03/31 18:23:19 INFO mapred.JobClient: map 100% reduce 0% >>>>>>>>> 14/03/31 18:33:25 INFO mapred.JobClient: map 50% reduce 0% >>>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Job complete: >>>>>>>>> job_201403311622_0004 >>>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Counters: 6 >>>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Job Counters >>>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: >>>>>>>>> SLOTS_MILLIS_MAPS=1238858 >>>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Total time spent by >>>>>>>>> all reduces waiting after reserving slots (ms)=0 >>>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Total time spent by >>>>>>>>> all maps waiting after reserving slots (ms)=0 >>>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Launched map tasks=2 >>>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 >>>>>>>>> 14/03/31 18:33:30 INFO mapred.JobClient: Failed map tasks=1 >>>>>>>>> >>>>>>>>> >>>>>>>>> I did a check to make sure the graph was being stored correctly by >>>>>>>>> doing: >>>>>>>>> >>>>>>>>> ghufran@ghufran:~/Downloads/hadoop-0.20.203.0/bin$ hadoop dfs >>>>>>>>> -cat input/* >>>>>>>>> 1 2 >>>>>>>>> 2 1 3 4 >>>>>>>>> 3 2 >>>>>>>>> 4 2 >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
